arXiv: 1506.0848Ivl [physics.comp-ph] 29 Jun 2015 


Spectral Ewald Acceleration of Stokesian Dynamics for polydisperse 

suspensions 


Mu Wang"’*, John F. Brady" 

‘^Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA 


Abstract 

In this work we develop the Spectral Ewald Accelerated Stokesian Dynamics (SEASD), a novel computational method 
for dynamic simulations of polydisperse colloidal suspensions with full hydrodynamic interactions. SEASD is based 
on the framework of Stokesian Dynamics (SD) with extension to compressible solvents, and uses the Spectral Ewald 
(SE) method [Lindbo & Tornberg, J. Comput. Pliys. 229 (2010) 8994] for the wave-space mobility computation. To 
meet the performance requirement of dynamic simulations, we use Graphic Processing Units (GPU) to evaluate the 
suspension mobility, and achieve an order of magnitude speedup compared to a CPU implementation. Eor further 
speedup, we develop a novel far-held block-diagonal preconditioner to reduce the far-held evaluations in the itera¬ 
tive solver, and SEASD-nf, a polydisperse extension of the mean-held Brownian approximation of Banchio & Brady 
[7. Chem. Phys. 118 (2003) 10323]. We extensively discuss implementation and parameter selection strategies in 
SEASD, and demonstrate the spectral accuracy in the mobility evaluation and the overall 0{N log N) computation 
scaling. We present three computational examples to further validate SEASD and SEASD-nf in monodisperse and 
bidisperse suspensions: the short-time transport properties, the equilibrium osmotic pressure and viscoelastic mod¬ 
uli, and the steady shear Brownian rheology. Our validation results show that the agreement between SEASD and 
SEASD-nf is satisfactory over a wide range of parameters, and also provide signihcant insight into the dynamics of 
polydisperse colloidal suspensions. 

Keywords: Stokes how, Stokesian Dynamics, Brownian Dynamics, GPU computation, Ewald summation, spectral 
accuracy, colloidal suspensions, polydispersity 


1. Introduction 

Colloidal suspensions are dispersions of small particles in a viscous solvent, and are found in almost every aspect 
of our life, ranging from dairy milk to printer ink. They have two distinguishing features: (i) Brownian motion of 
the particles due to thermal Huctuations, and (//) the long-range, non-pairwise-additive hydrodynamic interactions 
(His) mediated by the solvent. As a result of these features, dispersions exhibit many surprising behaviors such as 
non-Newtonian rheology, glass transitions, phase transitions, etc., and have attracted extensive scientihc and engineer¬ 
ing interests [1]. Using monodisperse colloidal suspensions as a model system, signihcant understanding has been 
achieved through theoretical, simulation, and experimental studies. 

However, naturally occurring colloidal suspensions are seldom monodisperse, and particle size differences are 
often unavoidable. In addition, particle size disparity introduces phenomena otherwise not observed in monodisperse 
suspensions. Eor example, size polydispersity reduces suspension viscosity [2-4], softens and even melts colloidal 
glasses [5], and promotes particle segregation in pressure driven hows [6]. Apparently, these behaviors can only be 
understood by studying dynamics of polydisperse colloidal suspensions. 

In this work we develop a computational method based on the framework of Stokesian Dynamics [7] (SD) for fast 
and realistic dynamic simulations of dense, polydisperse colloidal suspensions, with a focus on suspension rheology. 
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Presently, theoretical and computational studies on polydisperse colloidal suspensions, even for the simplest case 
of neutrally buoyant hard-sphere particles, are scarce, and heavily focus on the dilute or the short-time limits [8- 
12]; the former restricts His to the two- or three-body level, and the latter ignores suspension dynamic evolution, 
particularly the influence of Brownian motion. Beyond these limiting cases, we are only aware of the work of Ando 
& Skolnick [13], who studied particle diffusion in dense polydisperse colloidal suspensions using conventional SD in 
the context of biological molecular crowding. Their implementation limits His to the force-torque level, and therefore 
is unsuitable for rheological investigations. 

A difficulty in dynamic simulations of dense colloidal suspensions is the singular His due to the lubrication inter¬ 
actions between close particle pairs. To directly resolve His, a computational method must capture the flow details 
in the small gap between particles. For multipole expansion based methods [7, 14, 15], a large number of expansion 
terms are necessary to achieve convergence, and for methods based on surface or spatial discretization, such as the 
boundary element method [16, 17] or direct numerical simulations [18-21], very fine meshing is needed in the gap. 
Directly resolving lubrication interactions drastically increases the computational cost and limits many studies to low 
volume fractions. For example, the force coupling method study of Abbas et al. [22] on the dynamics of non-Brownian 
bidisperse suspensions is limited to particle volume fractions below 20%. 

A solution to the above difficulty is the SD framework [7], which exploits the local and pairwise additive nature of 
lubrication interactions. In SD, the long-range, non-pairwise-additive His are computed from the mobility perspective 
using low-order multipole expansions, and for particles in close contact, lubrication corrections are added pairwise to 
the corresponding resistance formalism. The corrections are based on the solutions of two-body problems with the far- 
field contributions removed. In this way, SD avoids directly resolving the singular lubrication interactions. The idea of 
lubrication correction in SD is general enough for incorporation to other computational methods. For example, similar 
lubrication corrections has been developed for hydrodynamic multipole methods [14, 15, 23, 24], the force coupling 
method [25], the lattice Boltzmann method [26], and the fictitious domain method [27]. Moreover, with an appropriate 
fluid solver, the lubrication corrections can be improved beyond the pairwise level [28]. We feel that, by incorporating 
the lubrication corrections, many recent computational techniques can significantly extend their accessible parameter 
range without an increased computational burden. This point is demonstrated in the present work, which essentially 
combines the lubrication corrections and the Spectral Ewald (SE) method of Lindbo & Tornberg [29, 30] for dynamic 
simulations of dense polydisperse suspensions. 

The Spectral Ewald (SE) method is a new particle mesh technique for computing long-range electrostatic [30] 
or hydrodynamic [29] interactions, and has recently been incorporated into the boundary element method for soft 
particles [31]. Particle mesh techniques including the Particle Mesh Ewald (PME) method [32] and the Smooth Par¬ 
ticle Mesh Ewald (SPME) method [33] have been extensively used for calculating His with 0(N\ogN) computation 
scaling. Note that, although algorithms based on the fast multipole method [34] can achieve a better computation 
scaling-down to 0{N), they often have significant computation overheads, and require large system sizes to justify 
the complexity [35]. Therefore, for many dynamic simulations, the particle mesh techniques remain the practical 
choice. Notable examples are Accelerated Stokesian Dynamics (ASD) [36] which uses the PME method for the far- 
field mobility evaluation, and the work of Saintillan et al. [37], where the SPME method is employed to study fiber 
sedimentation. Compared to other particle mesh techniques, the SE method is spectrally accurate, and can separate er¬ 
rors from mesh interpolation and the wave-space truncation. Both features are essential for capturing the complicated 
His in polydisperse suspensions. 

Another challenge in dynamic simulations of colloidal suspensions is Brownian motion, which is configuration 
dependent due to the fluctuation-dissipation relation. When Euler-Maruyama time integration is used, the determin¬ 
istic particle drift due to the Brownian motion must also be included [38]. As a result, computing Brownian related 
quantities requires the gradient and the square root of the mobility tensor. Eortunately, these quantities can be evalu¬ 
ated in a matrix-free manner under the framework of ASD, making dynamic studies on hundreds of colloidal particles 
possible [39, 40]. Moreover, the mean-field Brownian approximation, which estimates the mobility tensor based on 
the near-field His, is able to further speed up the computations [39, 41]. In this work, these developments are fully 
incorporated for the dynamic simulation of Brownian polydisperse suspensions. Note that a different approach to treat 
the Brownian motion is based on fluctuating hydrodynamics [42], where the thermal fluctuations are directly incor¬ 
porated in the governing fluid equations. It has been applied to the lattice Boltzmann method [43], the force coupling 
method [44], and the immersed boundary method [45]. 

The emergence of the General Purpose Graphic Processing Unit (GPGPU) programming often brings significant. 
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sometimes orders of magnitude, speed improvements for many existing algorithms. Recently, Kopp & Hofling [46] 
implemented the conventional SD for infinite solvent using GPGPU with direct HI summation. Despite the 0{N^) 
scaling, they achieved impressive speedup over the CPU implementation. However, to study the dynamics of homo¬ 
geneous suspensions, further extension to periodic systems are necessary. On the other hand, GPU acceleration of the 
SPME method [47, 48] in molecular dynamics provides access to millisecond-scale dynamics on personal computers. 
These acceleration techniques are applicable to particle mesh techniques in general, and inspired the present work. In 
particular, we used GPGPU programming to compute the His with the SE method in homogeneous suspensions, and 
realized almost an order of magnitude speedup in dynamic simulations. 

Eurthermore, our computation method extends SD to compressible suspensions, allowing dynamic simulations 
of constant pressure rheology [49] without introducing geometric confinement. This is possible because the flow 
disturbances due to rigid particles in a compressible solvent are incompressible and satisfy the Stokes equation [50]. 
Another benefit of such extension is that the suspension normal stress, which is essential for particle migration in 
sheared suspensions [51-53], can be directly evaluated. 

The remainder of the paper is arranged as follows: Sec. 2 establishes the basic formalism for His in compressible 
Stokes flow. In Sec. 3, various aspects of mobility computations with the SE method are presented. Here, we also 
discuss different approaches to incorporate particle size polydispersity and the GPGPU implementation. In Sec. 4, 
we present the Spectral Ewald Accelerated Stokesian Dynamics (SEASD) and its mean-field Brownian approxima¬ 
tion, SEASD-nf, for dynamic simulations of Brownian polydisperse suspensions. In Sec. 5 we carefully discuss the 
accuracy and parameter selections for the SE method, and the computation scaling of various SEASD implementa¬ 
tions. Sec. 6 presents a series of validation calculations for monodisperse and bidisperse suspensions with SEASD and 
SEASD-nf: Sec. 6.1 addresses the short-time transport properties. Sec. 6.2 evaluates the equilibrium osmotic pressure 
and viscoelastic moduli, and Sec. 6.3 presents various aspects of the steady shear rheology of Brownian suspensions. 
The results also reveal the role of particle sizes in the dynamics of bidisperse suspensions. Einally, we conclude this 
work with a few comments in Sec. 7. 

2. Hydrodynamic interactions in (compressible) Stokes flow 

2.7. The mobility and resistance formalism 

We first consider a suspension of N spherical rigid particles, each with radius a, and position r,, in an incompress¬ 
ible solvent of viscosity t/q and density po, occupying a volume V. Eor the special case of bidisperse suspensions with 
particle sizes ai and a 2 , the suspension composition is fully characterized by three dimensionless parameters, 

A = aila2, (p - (p\ + <p2, andy2 = filf, (1) 

where A is the size ratio, f is the total volume fraction, and y 2 is the volume ratio of species 2. The species volume 
fraction is 0Q. = a e [1,2], and the species number density is Mq.. The total number density satisfies « = n\-¥n 2 , 

and the species number fraction is Xa = na/n. Without loss of generality, we take a 2 > ai. 

If the particles are sufficiently small, the particle Reynolds number Rep ^ = poacUalTjo <s; 1, where Ua is the 
species characteristic velocity. In this limit, the velocity field v(r) and the pressure field p{r) of the solvent satisfy the 
Stokes equation, 

Vp = V ■ V = 0, (2) 

supplemented by no-slip boundary conditions on particle surfaces. Due to the linearity of Eq. (2), there is a linear 
relation between the velocity disturbance on the surface of a particle i, and the surface force density of another 
particle j, fj, 

u'fr) - - fdr' ^ /; X) ■ (3) 

^ j 

where Mij{r,r'-,X) is a mobility operator depending on positions r and r' and the suspension configuration X - 
(ri,r 2 , ...]. The surface force density is localized on the particle surface, i.e., fj{r) - cr(r) • njb(||r|| - aj), where 
cr is the stress tensor, nj is the surface normal of particle j, and 5{x) is the Dirac delta function. The stress tensor 
cr = -pi + / 7 o[Vv -H (Vv)'], with f indicating transposition and / is the idem tensor. The velocity disturbance u'fr) - 
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Ui + ilj X (r - r,) - v“’(r), where v°°(r) is the ambient flow satisfying V ■ v“ =0, and Ui and fl, are respectively 
the linear and angular velocities of particle i. By stacking the force density vectors / = (/i,/ 2 , ■ ■ ■)^ and the velocity 
disturbance vectors u' - (Hj, ■ ■ ■)' the grand mobility operator M is constructed from elements M/j in Eq. (3), such 
that 

u'(r) dr'M(r, r'; X) ■ f(r'), (4) 

for the N particles in the suspension. Eqs. (3) and (4) are known as the mobility formalism, and the inverse relation is 
the resistance formalism, 

j'dr'R(r,r'-X)u'(r'), (5) 

where R{r, r'; X) is the grand resistance operator. 

The integral representations in Eqs. (4) and (5) can be equivalently expressed as multipole expansions of /(r) and 
u'{r), f and u' respectively, around the particle centers, i.e.. 



[F^^l 


FI' 

II 

T 


and u'{r) ^ u' = 

^CC 


where is the generalized hydrodynamic force, is the hydrodynamic stresslet, 'll' is the generalized velocity 
disturbance, and E°° is the rate of strain tensor for the ambient flow. Note that = (E^, T^)^, where and 
are respectively the particle hydrodynamic force and torque for all particles, and 'll' - {U — U°°, il - where 

1/ - l/“ and Q - Q” are respectively the linear and angular velocity disturbances. The hydrodynamic force, torque, 
and stresslet for particle i are defined as integrals of the localized surface force density /,, 


pf-- 

1 drfiir). 

(7) 


f' dr(r-n)xf,(r), 

(8) 


f dri[(r-r,)/,+/;(r-r,)]. 

(9) 


In Eq. (6) the ambient velocities are evaluated at particle centers, i.e., 


iV X v“ 


and E“ = 


|[Vv“ + (Vv“)^]r,. The expansions in Eqs. (4) and (5) lead to the following infinite dimension linear relation, 

f = -m(X) ■ u' and u' = -m(X) ■ f (10) 


where 9Jt(X) and 93(X) are the multipole grand mobility and resistance tensors of operators M(r, r'; X) and R{r, r';X), 
respectively. Evidently, = 93 ', and from the Lorentz reciprocal theorem [54], both are positive definite. 

The infinite dimension vectors f and u' can be reduced to finite dimensions by projection. To the stresslet level 
of f and the strain rate level of u', we introduce projection matrices F and Q, such that F ■ f = (F^, and 
<3 ■ u' = ('ll', Moreover, F ■'P^ = <3 ■ <3^ = I, where I is an identity matrix. The following linear relation 
holds: 

,and';? = M‘, (11) 

where A1 = (39HF' is the (exact) grand mobility tensor and F. - F93(3' is the (exact) grand resistance tensor. Eor 
convenience, the grand resistance tensor is partitioned as 





Rrtt 

Rsti 


Rrs 

Rse 


( 12 ) 


where, for example, R<r%( describes the coupling between the generalized force and the generalized velocity. The 
linear relation in Eq. (11) can also be deduced from the linearity of Eq. (2) without appealing to the multipole ex¬ 
pansion, but here we establish a connection with other works, particularly the multipole methods of Cichocki and 


4 














coworkers [15, 55]. Note that for rigid spherical particles, external flows can only affect the first two moments of f 
and u' due to symmetry and the no-slip boundary condition. 

Elements of 911 and 91 can be computed from, for example, the induced force multipole [56, 57], eigenfunction 
expansions [15, 24, 58], and multipole expansions [7]. To the stresslet level, 911 can be conveniently evaluated by 
combining the Faxen formulae and the multipole expansions. For a rigid particle i in an incompressible solvent, the 
Faxen formulae are [7], 


Ui - 17“ 

Q, - 

^OC 


F? 


6nriQai ^ ® J 'n 

T'H 

H- iVxv'l 


STTT/oaf ^ 
fnTjoaf 


■(l + >2v2)i[Vv' + (Vv')^]|^^, 


(13) 

(14) 

(15) 


where the overline indicates the traceless part of the symmetric tensor, and v'(r) is the velocity held in the absence of 
particle i. With the fundamental solution of Stokes equation J{r) and the force density /, the velocity held v'(r) can 
be computed as [54], 

v(r) = r dr'J(r - /) ■ /(/). (16) 

oJTr]o J 

Expanding the force density around particle centers, we have 


"'«= Z (> + J+ fi - (1 + ^ s; 


(17) 


where the prime on the summation excludes the case i — j, and the functions J, R, and K are evaluated at r-rj. In the 
Cartesian tensor form, R = Rap - \esyp{^yJad - '^sJay) and K - Kapy - \ V^yJap + ^pJay], with eapy the Fevi-Civita 
symbol. With Eqs. (13)-(15) and (17), the grand mobility tensor 9Jl for incompressible solvents can be constructed in 
a pairwise fashion. 


2.2. The fundamental solutions 

The formalism in Sec. 2.1 relies on J{r), the fundamental solution of Stokes equation. Different boundary condi¬ 
tions such as periodicity [59, 60], confinement [24, 61], or a combination of both [62], can be incorporated to J{r). 
For an infinite expanse of fluid, we have the well-known Oseen tensor, 

J{r)^-(l + rr), ( 18 ) 

r 

where r = ||r|| and r - r/r. 

To study dynamics of homogeneous suspensions, periodic boundary conditions are necessary to assess the His. 
In this case, the proper fundamental solution J{r) describes the fluid velocity disturbance due to an array of periodic 
forces F YjpS(r - Rp), where Rp - Pd^d is the location of the periodic forcing. Here, p - (p\,p 2 ,pf) e 
6{r) is the 3D Dirac delta function, and ai, a 2 , and aj, are the Bravais lattice vectors describing the spatial periodicity. 
From Fourier expansion of Stokes equation [Eq. (2)], we have for the periodic J(r): 

J(r) = -y(/V^ - VV) Yj y exp(-ik ■ r), (19) 

ki^O 

where i - V^, the unit cell volume V = ai ■ (ai x a 3 ), the wave vector k - jd^d is defined by the reciprocal 
vectors bi, ba, and b 3 , j - ( 71 , 72 , 73 ) g and - k ■ k. Writing the lattice and the reciprocal vectors as column 
vectors and defining matrices A - [aia 2 a 3 ] and B = [bib 2 b 3 ], we have = 27rA“* and exp(ik ■ Rp) = 1. By 
requiring k 4^ 0 in Eq. (19), the external forces are balanced by the pressure gradient [59], a necessary condition for 
convergent His [63]. 
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A difficulty associated with His is the long range nature of J{r), i.e., Eq. (18) decays as r ' in the real space and 
Eq. (19) as in the wave space. Eor periodic systems, however, the conditionally converging sum in Eq. (19) can 
be split into two exponentially fast converging series, i.e., 

J(r) = J«(r) + Jwir), (20) 


where Juir) is the real-space sum, Jwir) is the wave-space sum. Although the splitting in Eq. (20) is not unique [29], 
a particularly efficient scheme by Hasimoto [59] utilizes the integral 


L ^ ^2 r 
^ Jo 


and the Poisson summation formula. The result is 


jS exp(-;7r^^j8)d/?, (k + 0), 


Mr) - ^(/V^-VV) 
Stt 


rErfc(r^) 














( 21 ) 

( 22 ) 

(23) 


where ^ is the splitting parameter and Erfc(.r) is the complementary error function. The real-space sum Jr only covers 
the neighboring periodic cells. The parameter ^ is consistent with the convention of Beenakker[60] and satisfies 
47ra^^ = 1, where a is the splitting parameter introduced by Hasimoto [59]. 


2.3. Extension to compressible fluid 

The formalism in Sec. 2.1 is limited to an incompressible fluid, i.e., the imposed flow must satisfy V ■ v“ = 0. This 
requirement is relaxed by imposing a uniform rate of expansion everywhere in the fluid, such that V ■ v*” = E°°, and 
the fluid is assumed compressible with a bulk viscosity kq. The rigid particles, unable to expand with the compressible 
fluid, generate a velocity disturbance that satisfies the incompressible Stokes equation [50]. Erom the linearity of 
Stokes flow, this velocity disturbance can be superimposed with other flows in the suspension, extending the existing 
formalism to compressible fluids. 

For a rigid particle of radius a, located at r, = 0, the velocity disturbance due to a compressible flow with an 
expansion rate E°° is 

vflr) = (24) 

This isotropic flow disturbance generates an isotropic stress contribution. Introducing the pressure moment as the 
trace of the stresslet in Eq. (9), i.e., 

Sf jdr(r-ri)-Mr), (25) 

we have = -^nr}Qa^E°° from Eq. (24). Therefore, the velocity disturbance due to a pressure moment Sf at the 
origin is 

= = (26) 

1 67:770 r ^ 

Adding the compressible velocity disturbances Vs(r) from other particles to the incompressible velocity disturbance 
v'(r) in Eq. (17), the general velocity disturbance in a compressible suspension is 

v'flr)^v'(r)+J]'Q(r-rj)Sf. (27) 

j 

When applying the Faxen formulae [Eqs. (13)-(15)] in compressible suspensions, the velocity disturbance v', instead 
of v', is used. 

In addition to Eqs. (13)-(15), the Faxen relation for the pressure moment in a compressible fluid is [64, 65] 

=-yTT/yofl-E”-H 47ra^p'(r,), (28) 
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where p' is the pressure disturbance without the particle at r,. The pressure disturbance can be obtained from the 
pressure fundamental solution of Stokes equation. 


P(r) - 4’ 

such that the pressure distribution due to a force density is 

= /dr'P(r-r') ■/(/). 

For the pressure disturbance p' in Eq. (28), expanding the surface force densities leads to 

j 


(29) 


(30) 


(31) 


Eq. (28) is different from the Eaxen formulae in Eqs. (13)-(15) as it presents the pressure moment or trace of the 
stresslet on the left hand side. This subtle difference highlights a distinct feature of the compressible flow disturbances: 
in a compressible fluid, the pressure moment can cause particle movement satisfying the incompressible Stokes equa¬ 
tion, but the incompressible force moments cannot generate compressible disturbances. As a result, the interaction 
part of the pressure moment can only be evaluated after F?, T^, and Sf are known. Otherwise, the resulting hydro- 
dynamic interactions contain spurious contributions due to the unphysical coupling between the incompressible force 
moments and the compressible flow disturbances. 

To extend the above results for Vj and 5 to periodic boundary conditions, we note that the divergence of Q in 
Eq. (26) satisfies 

V ■ e = (32) 


since V^r ' = -Andir). This means that, for uniform expansion in compressible suspensions, the particles act as 
fluid sources, each with a strength proportional to its pressure moment. In a periodic system, the velocity disturbance 
corresponds to an array of sources are obtained by replacing the delta function in Eq. (32) with 2^ 6{r - Rp). Erom 
Eourier transform, the solution is 


Q(r)^ 


1 

4770V 




ikr 


(33) 


The above wave-space sum can be split to two exponentially converging series [30, 59] 


ki^Q 




(34) 


Similar to Q(r), the pressure fundamental solution P{r) in Eq. (29) can also be extended to periodic systems. 


3. The mobility computation 

The mobility problem seeks the action of the grand mobility tensor on the force moments such as and S^. 
It can be constructed in a pairwise fashion using the formalism in Sec. 2 for compressible suspensions. Naively, this is 
an O(N^) operation for an A-particle system since the long-range His necessitate considerations of all particle pairs. 
However, with the Ewald summation that splits the fundamental solutions J{r), Q{r), and P{r) into exponentially fast 
converging wave-space and real-space series, the particle mesh techniques can improve the computation scaling to 
0(N\ogN). In the following, our implementation of the mobility computation is discussed. 


7 



3.1. Wave-space computation: the Spectral Ewald (SE) method 

The wave-space computation concerns the part of grand mobility tensor associated with Jw{r) of Eq. (23) and the 
wave-space sum of Eq. (34) in P{r) and Q(r). Using the East Eourier Transform (EET) algorithm, the computation cost 
can be reduced to 0(N log N). To illustrate this, let us consider the wave-space velocity disturbance U'^ on particle i 
at the Rotne-Prager level, obtained by combining Eqs. (13), (17), and (23), i.e.. 




= Z (> - g.(« i; (‘ - 'z-y-y-'-i'. 




(35) 


and the wave-space kernel 

g,{k)^{\ + - kk). (36) 

Different from Eq. (17), the summation over particle j in Eq. (35) is unrestricted and includes the case of i - j. 
Therefore, the self interaction term for i - j, which is 

^(9-10^2^2+7^4^4) 

- 18;o.3/. (37) 

should be removed later. Eq. (35) exposes the basic idea behind many particle mesh techniques including the PME 
method and the SPME method. Prom an inverse Eourier transform, the real-space force distribution corresponding to 
the summation over j in Eq. (35) is 

^(l + ifl2v2)fH5(r-r,). (38) 

j 

The force distribution in Eq. (38) is assigned to a regular spatial grid by approximating the delta functions by La- 
grangian polynomials in the PME method [ 66 ] or Cardinal B-splines in the SPME method [33]. The interpolated 
forces are then transformed to the wave space by PET and the wave-space computation in Eq. (35) is performed. The 
wave-space results is then brought back to the real space by inverse PPTs. Subsequently, the velocity on each particle, 
U^, is interpolated back from the grid, preferably using the same interpolation scheme for the force assignment [67]. 
Here, the action of the mobility tensor on the force F^, rather than the tensor itself, is computed. The kernel Qiik) in 
Eq. (36) is effectively a low-pass filter that cuts off the spatial signals at high k. Computationally, for grid points 
the PET scales as 0(M^ log M^). To ensure reasonable accuracy, oc N, and the wave-space computation scales as 
0(N log N). 

There are two sources of error affecting the accuracy of particle mesh techniques. The first is associated with the 
truncation of the wave-space sum (k-summation) in Eq. (35). This is only affected by the number of grid points M 
in the simulation box. The second error is the interpolation error, and arises from polynomial approximation of the 
b-functions in Eq. (38). Por a simulation box of size L, this error scales as (LIMY, where p is the polynomial order 
of the approximation scheme. Since both errors are associated with M, we cannot separate the two error sources. 
Consequently, to maintain a satisfactory overall accuracy, a large M is often used in the wave-space computations to 
keep the interpolation error small, resulting in unnecessary PET computations. 

In addition, for polydisperse suspensions, different particle sizes introduce additional complications to traditional 
particle mesh techniques. If the Laplacian in Eq. (38) is computed in the real space in the SPME method, the inter¬ 
polation error increases to (LIM)p^^, which further increases the M requirement. Por the PME method, real-space 
differentiation is unsuitable due to the discontinuity of Lagrangian polynomials, and all the computations have to be 
carried out in the wave space. This significantly increases the total number of PPTs. In addition, different particle 
sizes increase the complexity in the algorithm implementation. Therefore, a simple method with flexible error control 
is crucial for accurate and efficient wave-space computation in polydisperse systems. 

To address these concerns, we use a new particle mesh technique, the Spectral Ewald (SE) method [29-31] for 
the wave-space mobility computation. The SE method decouples the k-space truncation and interpolation errors, 
and is accurate, efficient, and flexible for polydisperse systems. To show this, we use Eq. (35) again as an example 
and consider the general case of non-orthogonal lattice vectors. We first introduce the fractional coordinate t - 
( 6 U 2 U 3 )*^ g [0,1)2. Por each point r in the simulation box, r - t\Sti + f 2 a 2 -i- f 3 a 3 - A t. Accordingly, defining 
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q - (_q\,q 2 , ^ 3 )^ such that k - ^ibi + q' 2 b 2 + Q'abs - B ■ q, exp(i^ ■ r) - e.yjp{2niq ■ t), and - q^ ■ ■ B ■ q. Eq. (35) 

is rewritten in t and q as 


Tj (1 - \a]q^ ■ B'' ■ B ■ g^iB ■ q) 

q*0 

■ ^ (1 - la]q^ ■ B'- ■ B ■ 


(39) 


with two e ■ multiplied after particle positions and one ' before Qi, and 0 is a parameter. Introducing the 
Fourier transform pair 

r . . r . , . 


/, = J df/(f)e2-«'and/(f) = J dqf,e-^’^“>‘. 


the basic idea of SE is to note that 


h{t) - I dqe 


/< 


-2mq-t-\9q^F- _ ( 


0 


exp 


.2i:2 


STT^f 

¥ 


(41) 


i.e., the the Fourier transform of a Gaussian remains a Gaussian, and the shape of the Gaussian is controlled by 6. 
Here, || • ||, indicates distance computation using the minimum image convention for periodic systems. The inverse 
Fourier transform of the second line of Eq. (39) with respect to q is 


H«) = 2 (‘ + ■ B* ■ B ■ 


(42) 


where V, = ( 5 / 5 fl, 5 / 5 f 2 ,l 9 /( 9 f 3 )^ Eq. (42) facilitates interpolation of a discrete force distribution onto a uniform 
grid of coordinate t via the Gaussian shape function h{t) in Eq. (41). The effect of particle size is automatically 
incorporated in the grid assignment scheme in the real space. After converting the real-space H{t) to the wave-space 
Hq using FFTs, the wave-space computation produces 




I et” 

lo 


9i{B q) Hq, q + 0 


otherwise. 


(43) 


From Parseval’s theorem. 


£dtf(t)g*(t) ^YjfqSr 


(44) 


where T is a periodic lattice and (■)* indicates complex conjugation, Eq. (39) becomes a convolution integral with the 
Gaussian shape function. 


where G{t) is the inverse Fourier transform of Gq. Extending the SE method to couplings beyond Rotne-Prager level 
is straightforward, with adjusted H{t) and G{t) based on the Faxen laws and multipole expansions in Sec. 2. In this 
work, we have implemented the mobility computation to the stresslet and the strain rate level. 

Unlike other particle mesh techniques, the SE formulation in Eqs. (39)-(45) is exact and therefore the errors are 
entirely from the numerical implementations. Since the EFT algorithm is accurate to machine precision, the sources 
of error include the discretization and truncation of the shape function [Eq. (41)], and the numerical integration in 
Eq. (45). Practically, the evaluation of each shape function is limited to points {P < M) around the particle. 
Due to the exponential decay of h{t), the truncation error decreases exponentially with increasing P. Meanwhile, the 
integral in Eq. (45) is evaluated using trapezoidal quadrature [29, 30], which also exhibits exponential error decay 
with increasing P. Therefore, the interpolation error in SE method depends exclusively on P for sufficiently large M, 
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and can be separately controlled from the A:-space truncation error. The rapid, exponential error decay is known as 
spectral accuracy [29, 30], and this is the namesake of the SE method. 

The computation cost of the SE method also becomes apparent with the truncation of h{t). The grid assignment in 
Eq. (42) and the convolution Eq. (45) are 0{NP^) for an A^-particle system, and the EETs to and from the wave space 
are 0[M^ log(M^)]. With oc N, the time limiting step is the EET, and the SE method also scales as 0{N log N) as 
other particle mesh techniques. 

The Gaussian shape in !i(t) of Eq. (41) is controlled by 0, which is parameterized as 


0 = 



(46) 


on a regular grid of points with points for each shape function evaluation. The shape parameter m in Eq. (46) 
ensures that at the edge of h{t) evaluation, i.e., = f’^/(2M)^, h oc Therefore, with fixed M and P, m 

describes the truncation of h{t) on the discretized grid and is consistent with the original SE method of Lindbo & 
Tornberg [29, 30]. 

The computation efficiency of the SE method relies on rapidly computing the O(NP^) different Gaussian shape 
functions !i{t), which involves expensive exponential evaluations. To reduce these expensive operations, Lindbo & 
Tornberg [29, 30] introduced the fast Gaussian gridding (EGG) technique [68] to the SE method. In essence, the EGG 
technique evaluates the exponential function on a regular grid as 

where or is a constant, 6t is the off-grid value. At is the spacing of the regular grid, and i is an integer within the 
range [-P/2, P/2]. It reduces the P exponential evaluations in each direction in the SE method to 3 exponential 
computations and at most 2P multiplications. In addition, the last term of Eq. (47) is independent of 6t, and therefore 
only needs to be computed once. 


3.2. Wave-space computation: the particle size effect 

In Sec. 3.1 the terms associated with finite particle sizes in the Eaxen laws and the multipole expansions are 
incorporated in the real-space derivatives of the shape function h{t). Eor example, in a simple shear flow with lattice 
vectors ai = (L, 0,0), ai = {yL, L, 0), and as = (0,0, L), where y is the strain, the relevant term in Eqs. (42) and (45) 
is 


B^ B V,)h(t) = 

I {-0(3 + r") + + r)fT + - 2 ytit 2 ]}h(t). (48) 

The finite particle sizes introduce additional features to the shape function, and for non-orthogonal simulation boxes, 
non-trivial anisotropy. As a result, compared to the case of point forces, more points P are needed to resolve the 
details in Eq. (48). On the other hand, the benefit of evaluating the particle size effects in the real space is that fewer 
EETs are involved. To compute the mobility problem of compressible suspensions to the stresslet and the strain rate 
levels, only four pairs of EETs are necessary: three are associated with Jw in Eq. (23), and one associated with the Q 
in Eq. (26). 

Alternatively, the particle size effect can be completely accounted in the wave space. This requires, for each 
particle j, Fj, T^, and Sj, as well as a^jFj and to be separately interpolated to the grid via h{t) and brought to 

the wave space for computation. The derivatives associated with the Eaxen laws and multipole expansions in Sec. 2 
are carried out in the wave space as multiplication of wave vectors. The final results are then combined from different 
convolutions and weighted by the particle sizes. To demonstrate this, we again take the wave-space Rotne-Prager 
velocity, Eq. (39), as an example. In this approach, the grid assignment is split into two parts, 

H'(t) = ^ h{t - tj)F^ and H''(t) = ^ h(t - tj)a]F^. (49) 

j j 
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The wave-space computation for q ^ Ois also split as 



( 50 ) 


(51) 


and G'^ = G” = 0 when q -0. The wave-space velocity disturbance is a sum of two convolutions, 


l/f' = — r dtG'(t)h(t - t,) + — f dtG''(t)¥t - ti). 



(52) 


Note that the convolution associated with G”{t) is weighted by the particle size a,. Compared to the other approach, 
the wave-space computation is rather straightforward for the force interpolation and convolution. With the same P, the 
accuracy is expected to be higher as the derivatives are calculated in the wave space [67]. However, the computation 
burden is shifted to the FFTs: for the mobility problem to the S and E level, a total of 20 pairs of FFTs are necessary: 
12 for FJ, TJ, and Sj, three for a^jFj, and five for the traceless part of a^jSj. 

A third approach, a hybridization between the wave- and the real-space approaches above, aims to reduce the errors 
associated with the high order derivatives of h{t) in the real space. It retains the real-space derivatives in the force 
interpolation step, but when evaluating the Faxen laws, the second order derivatives are computed in the wave space 
for improved accuracy. The first order derivatives are computed in the real space to keep the total number of FFTs 
low. As a result, this hybrid approach requires 12 FFTs: four to the wave space and eight from the wave space. Taking 
Eq. (39) again for example, the most significant error in Sec. 3.1 is due to applying the operator (Vj ■ ■ B - V,) twice 

to h(t), once during the force interpolation, and another time during the convolution. The hybrid approach retains the 
real-space grid assignment using H(t) in Eq. (42), but evaluates the convolution using Eq. (52) with modified G'(t) 
and G''(t): in the wave-space computations, the content in the square bracket on the right hand side of Eqs. (50) and 
(51) is replaced with Hg in Eq. (42). We adopted this hybrid approach in this work to compute the His, and discuss 
the accuracy of various approaches in Sec. 5.1. 

3.3. Real-space computation 

The real-space contributions to the grand mobility tensor DJI are computed pairwise using the formalism in Sec. 2. 
Since Juir) [Eq. (22)] decays exponentially fast with distance, when the parameter^ is sufficiently large, only particle 
pairs within a cutoff distance need to be evaluated. Introducing the cutoff radius for pair evaluation allows 
fast neighbor searching algorithms such as the linked list [69] or the chaining mesh [70] method to be used. These 
methods divide the simulation box into cells of size slightly larger than r^, and sort the particles into the cells. To find 
the neighbors of a particle, only particles in the residing cell and its 26 neighboring cells need to be searched. This 
effectively improves the operation count to 0{N log N) for the real-space computations. 

To accommodate the iterative scheme for HI computations in Sec. 4, the real-space grand mobility tensor is con¬ 
structed as a sparse matrix at each time step. After the matrix construction, the action of the real-space contributions 
to DJI is simply a matrix-vector multiplication. Otherwise, neighbor searching and pair HI evaluations need to be 
carried out at every iteration. Note that we also include the self-contributions from the wave-space computations, e.g., 
Eq. (37), and the self-part of the pressure Eaxen law [Eq. (28)], in the real-space grand mobility tensor. 

3.4. GPGPU acceleration of the mobility computation 

The mobility computation with the SE method was first implemented on CPU and the performance was unsat¬ 
isfactory for dynamic simulations. The bottlenecks are the force interpolation step and the convolution step. These 
are common speed limiting steps in particle mesh techniques due to ineffective memory caching between the particle 
and the grid data. Eor polydisperse systems in this work, the situation is aggravated as more interpolation points P 
are needed for satisfactory HI resolution. After a few optimization iterations on CPU, we realized that the key to the 
performance is the memory bandwidths. Since modern GPUs typically have significantly higher memory bandwidths 
compared to CPUs, in this work the entire mobility computation is carried out on GPU using CUBA C, a popular 
GPGPU programming model with a relatively mature environment for scientific computations. 

The GPU mobility computations are carried out in Single Precision (SP) for the highest GPU performance. The 
cost of the performance in SP computation is the accuracy, as the SP arithmetics can be severely limited by the number 
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of significant digits compared to the Double Precision (DP). However, this is not a problem in this work for at least 
three reasons: (/) For dynamic simulations with iterative solvers, the SP accuracy is often sufficient; (ii) The SE 
method is able to reach the round-off error of the SP arithmetics with proper parameter selection due to its spectral 
accuracy; and (Hi) The far-held His captured by the mobility computations are smooth compared to the lubrication 
interactions, which are evaluated in DP on CPUs. The split of the near- and far-held His in SD allows a natural mixed 
precision HI computation that captures the most signihcant contributions from each part. 

The GPGPU computations exploit the massively parallel structure of modern GPUs by simultaneously executing 
a large number of similar tasks, or threads, on the data. To maintain performance, data dependencies and commu¬ 
nications between threads should be minimized. This makes the GPU implementation of the SE method different 
from its CPU counterpart. Inspired by earlier GPU implementations of particle mesh techniques, this work combines 
the grid-based method of Ganesan et al. [47] for force interpolation and the particle-based approach of Harvey & 
De Eabritiis [48] for convolution. The grid-based force interpolation keeps a list of contributing particles for each grid 
point, and the list is updated when the particle conhgurations are changed. The grid values are computed in parallel 
using threads: with the particle list, each thread sums the force, torque, and stresslet contributions independently 
for each grid point. On the other hand, the particle-based convolution is a weighted summation on grid points 
for each particle. To maximize parallelization, the summation for each particle is performed by a group of P threads 
cooperatively. Each thread in the group first sums P^ grid points on the transverse plane, and for the final result, the 
first thread in the group adds up the values from other threads using the shared memory of the GPU. Moreover, on the 
GPU we use the cuf f t package for the EETs and the cusparse package for the sparse matrix-vector multiplication. 


4. Dynamic simulation with Stokesian Dynamics 

The framework of SD [7, 63] approximates the projected grand resistance tensor 7? in Eq. (12) as 

7? = (53) 

where DJI is the multipole grand mobility tensor, and 7?"^ is the pairwise additive lubrication correction without the 
far-held contributions. Recall that the inversion of DJI captures the many-body aspect of His, and the short-range 
correction 7?"^ captures the lubrication effects. The SD recovers the exact result for two-body problems and agrees 
well with the exact solutions of three-body problems [71]. It can provide signihcant insights to the His of dense 
suspensions [72, 73]. 


4.1. Iterative computation of hydrodynamic interactions 

We incorporate the SE mobility computation into the framework of SD using the iterative scheme of Swan & 
Brady [62], and call the resulting method the Spectral Ewald Accelerated Stokesian Dynamics (SEASD). Here, a 
matrix-free iterative scheme is necessary as the grand mobility tensor DJI is not explicitly constructed. The iterative 
scheme splits the overall hydrodynamic force, 

= -Rr^f ■ ^(^ + RrE ■ (54) 

where is the velocity disturbances due to His, into a near-held part and a far-held part. The near-held part satishes 

0 = H- p, (55) 


where R^<k is the coupling in 7?"^ and is stored as a sparse matrix, + R'^^ ■ E°° contains the interparticle 

force and the near-held contributions from E°°. The far-held hydrodynamic force satishes 


^CX) 


-m- 


■jrH.ft 
SH,fF , 


(56) 


where ® is the far-held stresslet from His. 
stresslets are 


SH,ff 


= ■ 


Solving Eqs. (55) and (56), the far-held hydrodynamic forces and 


^(TiH - J) 


0 



(57) 
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where 


Wl = 


J - AM) ■ 




(58) 


To ensure invertibility, a diagonal matrix AI, with A a parameter, is added to i-e., 

accordingly A convenient choice for A is 67n]oa, where a is the reference particle radius [62]. 

Solving Eq. (57) requires nested iteration as each evaluation of 21t contains the solution of the near-held problem 
with The near-held problem is efficiently solved by the Generalized Minimum Residual (GMRES) method with 
an Incomplete Cholesky preconditioner with zero hll-in (ICO) [74]. To reduce the ICO breakdown, prior to applying 
the preconditioner particles are reordered using the reverse Cuthill-McKee algorithm. For isotropic suspensions, the 
near-held problem typically converges to an error of 10 within 10 iterations [36]. For suspensions with strong 
structural anisotropy, however, the convergence becomes more difficult and the ICO preconditioner breaks down even 
with the reordering. This is resolved by increasing A in or introducing a threshold value die in during the ICO 
preconditioner computation [74]. Increasing A in R'^^ does not change the convergence of the near-held problem, 

but increases the number of expensive 9J1 iterations. On the other hand, increasing ffic deteriorates the quality of the 
ICO preconditioner and increases the iterations required for the near-held problem, but has little effect on the far-held 
evaluations. In dynamic simulations, both A and djc are adjusted for optimal computation efficiency. 

The pressure moment computation in SEASD also follows the near- and far-held splitting scheme in Eqs. (55) 
and (56). Due to the special coupling between the pressure moments and other force moments in compressible 
suspensions (Sec. 2.3), the interaction contribution to the far-held pressure moment is evaluated after ® and the 
traceless part of are solved in Eq. (57). On the other hand, the near-held part of the pressure moment is evaluated 
along with other parts of the stresslets using the near-held resistance functions. 

The near-held pairwise lubrication corrections 9?"^ are based on the exact solutions of two-body problems in series 
form [64, 65, 75, 76] up to where s - IrUai+aj), with ai and aj the radii of the pair, is the scaled particle center- 
center distance. In the simulations, the lubrication corrections are activated when i < 4: for i > 2.1 the interpolation of 
tabulated data and for i < 2.1 the analytical expressions are used. Note that constructed from two-body problems 
contains both the relative and the collective motions of the particle pair and, as pointed out by Cichocki et al. [23], 
the lubrication corrections corresponding to the collective motion can destroy the far-held asymptotics beyond the 
pair level. However, for dense suspensions, this only leads to a minor quantitative difference on the suspension static 
properties [11] in conventional SD. Therefore, we retain the full lubrication correction here for consistency with the 
existing SD framework. The SD implementations of Ando 8c Skolnick [13] removed the pair collective motion in the 
lubrication corrections. 


4.2. Far-fieldpreconditioner 

Here we introduce a preconditioner for DAI to reduce the number of expensive far-held mobility evaluations when 
solving Eq. (57). Since is not explicitly constructed, the preconditioner needs to be built from a suitable approx¬ 
imation. For mobility problems without the lubrication corrections, Saintillan et al. [37] and Keaveny [44] found 
substantial iteration improvement even with the diagonal mobility approximation. Unfortunately, the approximation 
of dJl is more involved due to the presence of {R^^Y^. In this work, a block diagonal approximation of 9JT for the 
far-held preconditioner is adopted. First, the near-held resistance tensor R'^,^ is approximated by N blocks of 6 x 6 
submatrices along its diagonal. Using the direct sum notation, this is ®^i(/?^.^)„, where ^ is the direct sum, and 
{Rf^) ij is the block submatrix between particles i and j in To approximate we use 


N 

^ (59) 

1=1 

which only involves N inversion of 6 x 6 matrices. The mobility tensor 9JI is approximated by its block-diagonal 
components using direct Ewald summation, i.e., for each particle, the approximation only considers the interactions 
with its periodic images. To obtain the preconditioner, we apply the Incomplete LU decomposition with zero hll-in 
(ILUO) [74] on the approximated 311, which is constructed following Eq. (58) with the approximated (R^^Y^ and 311. 
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Figure 1: The number of far-held iterations, i.e., the number of the grand mobility tensor evaluations, as a function of the GMRES residual with 
(solid line) and without (dashed line) the far-held preconditioner for a bidisperse suspension of = 200, A = 2, X 2 = 0.3, and (f) = 0.2. 


Unlike Saintillan et al. [37], including close pair interactions has an adverse effect on the preconditioner due to the 
diagonal approximation of 

The effectiveness of this preconditioner on the far-held iteration is demonstrated in Fig. 1. In this case, the His 
corresponding to random forces and strain rates are solved for a random bidisperse suspension of 200 particles with 
A - 2, X 2 - 0.3, and (p — 0.2. The far-held preconditioner substantially reduces the number of GMRES iterations. 
Evidently, its usage is justihed when the required GMRES residual is small, since constructing the approximate 
and the ILUO decomposition also take time. In dynamic simulations, further time saving can be achieved by 
updating the preconditioner every few time steps. In addition, the exact break-even time also depends on the far-held 
mobility computation parameters, including M, P, and that indirectly affect the iterative solver. Einally, since the 
preconditioner construction is an 0{N) operation and the evaluation scales as 0(N log N), preconditioning is almost 
always justihed for large systems. 

4.3. Dynamics simulation of Brownian suspensions 

Particle dynamics in a suspension are described by the generalized A^-body Langevin equation, 

m~ + r^ + r^ ( 60 ) 

df 

where m is the generalized mass/moment of inertial matrix, 'Ll is the generalized particle velocity and and 

■F® are the forces on particles. The hydrodynamic force arises from the His and can be computed from Eq. (54). 
The interparticle force F® originates from the interparticle potentials. The Brownian force F® is due to thermal 
huctuations in the solvent, and from the huctuation-dissipation theorem [77], F® satishes 

F®(f) = 0 and F®(0)F®(f) = 2kYiT6{t)Rrv, (61) 

where the overline denotes an average over the solvent Huctuations and k^T is the thermal energy scale. 

The conhguration evolution is obtained by integrating Eq. (60) twice over an appropriate time scale Af, and the 
result is [38, 78] 

AX = ■ [RrE ■ + F'’)] Af + kerV ■ Rj^^At + AX®, (62) 

where AX is the suspension conhguration change over time Af, 'Z/°° is the generalized velocity from the imposed How, 
and AX® is the Brownian displacement which satishes 

A^ = 0 and AX® AX® = 21(^1 AtRp^. (63) 
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The second term on the right hand side of Eq. (62) is the deterministic drift due to the configuration dependent 
Brownian force and the divergence operator is acting on the last index of The divergence can be numerically 
evaluated following Banchio 8c Brady [39]. 

The suspension bulk stress is obtained by spatially averaging the Cauchy stress [50, 51], i.e., 

{1} = -{p)tl + 2 tjo(E^) + (ko - |/7o)£“/ - nksTI + n«S^) + {S^} + <S®», (64) 

where {p)f is the average solvent pressure, (■) is the volume average over the entire suspension, ato is the fluid bulk 
viscosity, and n is the particle number density. The particle stresslets are broken down as + S®, 

where S® is the contributions from the the imposed flow, S® from the interparticle potential, and S® from the Brownian 
motion. Their suspension averages are expressed in resistance tensors 

<S®) = - </?s« ■ Rr'ii ■ RrE - Rse), (65) 

{S^}^-{(Rs<u-Rj}^ + rl)-F^}, ( 66 ) 

(S®) = - kB7’<V-(/?str ■ /?^!„)>, (67) 

where the divergence in Eq. (67) is applied to the last index in the bracket. Eor hard-sphere suspensions, (S®) = 0 as 
the HI and the interparticle force contributions exactly cancel each other [51]. The Brownian stresslet (S®) can also 
be computed using the modified mid-point scheme [39]. 

In dynamic simulations, the Brownian displacement AX® is evaluated from the Brownian force 'F® in Eq. (61) as 

AX® = Ry}^ ■ F®Af. (68) 

Eollowing Banchio & Brady [39], the Brownian force can be split into a near-fleld part and a far-fleld part, 

F® = F®’"^ H- F®’". (69) 

Both F®'"^ and F®'*’^ have zero mean and satisfy 

orB.nfrrrB.nf f 

^ ^ ~ At 

F®’ffF®-ff 

'jrB.Srp'B.nf —Q 

where (931 *)^i/ is the FT/ block of the inverted far-fleld grand mobility tensor. The pairwise-additive lubrication cor¬ 
rections allow pairwise evaluation of the near-field Brownian force F®’"^ [39]. Since 931 is not explicitly constructed, 
to compute F®’*®, it is necessary to solve 

= ^^(931^1/“) ■ T, (73) 

At 

where T is a Gaussian noise of zero mean and unit variance, and AS® is the fluctuation part of the Brownian stress in 
Eq. (67). The inverse square root of the grand mobility tensor 931 in Eq. (73) can be approximated using Chebychev 
polynomials with eigenvalue estimations [39, 79], or solved as an Initial Value Problem (IVP) [40, 80], which was 
first used by Swan & Brady [40] in ASD. The solution of the following IVP [81] with matrix A, 

djkT 

— - -i [tI + (1 - t)A]-‘ -{A-D-x, ;r(0) = c, (74) 

dr ^ 

at T = 1 satisfies ■ c. Swan & Brady [40] devised a numerical scheme to solve Eq. (74) in ASD: at each 

time step with step size At, Eq. (74) is marched first with a Euler forward half-step then a Euler backward half-step, 
i.e., 

,7' = - 5 [Til + (1 - t,)A]-* ■ (A - /) ■ (75) 

At/2 ^ 


AS® 


(70) 

(71) 

(72) 
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Xi+\ 

a772 


- \ [tmI + (1 - t,+i)A]-‘ ■ (A - /) ■ Xm. 


(76) 


With A = 9Jl and c = (2k-aTlAtyV, Eq. (73) is solved at r = 1. In SEASD, both Eqs. (75) and (76) are solved 
iteratively, usually with a smaller tolerance compared to At. The results with Ar = 0.1 are often satisfactory. 

Eor dynamic simulation of Brownian suspensions under a simple shear flow with strain rate y, the ratio of the 
convective transport rate y and the diffusive transport rate k^Tf{6nrioa^) defines the Peclet number, 


Pe = 


bnrjoaly 

ksT 


ill) 


Small Pe indicates Brownian motion dominance, and large values suggest negligible Brownian influences. Eor bidis- 
perse suspensions, we define Pe based on the size of the small particles to capture the dynamics of the most rapid 
changes, i.e., flp = a\. In dynamic simulations, the time in Eq. (62) is scaled according to the Peclet number: when 
Pe < 1, it is scaled with the diffusive time scale of the small particles, 6nT]Qay {k^T), and when Pe > 1, the convective 
time scale y~^. 


4.4. The mean-field Brownian approximation 

The most time-consuming step in dynamic simulations of Brownian suspensions is computing from Eq. (73) 
due to the large number of DJI evaluations, although the IVP approach in Sec. 4.3 is expected to be faster than the 
Chebychev approximation [40]. Eurther speed improvement is possible by introducing a mean-field approximation 
of the Brownian-related quantities [39]. In this approach, the far-field grand mobility tensor DJI is approximated as 
a diagonal matrix for all Brownian related computations, and the full HI computations are retained for the flow- 
related quantities such as S^. As a result, this method retains the 0(N\ogN) scaling, but with an order of magnitude 
smaller prefactor for monodisperse suspensions [39]. The diagonal approximation of DJI uses the single particle result 
for the ES coupling, and the far-field translational and rotational short-time self-diffusivities for the 'ICF coupling. 
These far-field values are from Monte-Carlo computations of equilibrium configurations at the same volume fraction 
without the lubrication corrections. Extending this approach to polydisperse suspensions is trivial: the suspension far- 
field diffusivities in the diagonal elements are replaced by the far-field diffusivities for each species. The mean-field 
Brownian approximation is especially suitable for studying dense suspension rheology, where the His are dominated 
by the near-field lubrication interactions. Eollowing Brady &. Banchio [39], we designate this approximation scheme 
SEASD-nf 


5. Accuracy and performance 


5.1. Mobility computation accuracy 

The accuracy of the mobility computation is characterized by the relative oo-norm of the strain rate, i.e., 


^oo,r(E') 


max 


IIEf - £*11 

l|£*ll 


(78) 


where the Ef^ is the particle strain rate from the SE method and E* is a well-converged value from direct Ewald 
summation. Other error measurements can be similarly defined. Eor example, eoafiU) for the linear velocity was used 
by Lindbo 8c Tornberg [29] to characterize the accuracy of the SE method for point forces. Eor the stresslet-strain rate 
level mobility computation here, we found eoafiE) the most stringent error criteria, possibly because more derivatives 
are involved in Eq. (15). 

To facilitate quantitative discussions, in this section we focus on a random bidisperse hard-sphere system of 
N - 50, (p - 0.05, A - 2, and X 2 = 0.3. The imposed force, torque, and stresslet on each particle are randomly drawn 
from a normal distribution, and rescaled to ensure ||f ,|| = 1, ||7',|| = 1, and ||S,|| = 1. The the simulation box lattice 
vectors are ai = (L, 0,0), 32 = (yL, L, 0), and 33 = (0,0, L), with y the strain. The computations are carried out in DP 
accuracy on CPU. 
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Figure 2: The wave-space accuracy measured by e„,r(£) [Eq. (78)] as a function of the interpolation point P with various shape parameter m at 
M = 64 and fai = 0.1. The particle size effects are incorporated using (a): the real-space, (b): the hybrid, and (c); the wave-space approaches in 
Sec. 3.2. The values of m are annotated in each figure. The solid and dashed lines represent the case of y = 0 and 0.5, respectively. The dashed 
dotted lines show the exponential minimum en'or decay, eoo,r{E) ~ exp{-P7r/2). 


5.1.1. Wave-space accuracy 

Fig. 2 presents the accuracy of wave-space computation using different SE implementations with orthogonal (y = 
0) and sheared (y = 0.5) simulation boxes in solid and dashed lines, respectively. The error Coo.riE) is shown as a 
function of the interpolation point P with various shape parameter mat M = 64 and ^ai =0.1. Different particle size 
incorporation approaches discussed in Sec. 3.2 are presented: in Fig. 2a the real-space approach, in Fig. 2b the hybrid 
approach, and in Fig. 2c the wave-space approach. 

There are several key observations in Fig. 2. First of all, the errors associated with orthogonal and sheared simula¬ 
tion boxes are almost identical. This validates the general formalism for non-orthogonal simulation boxes in Sec. 3.1. 
Secondly, the SE method is sensitive to P and m, which respectively correspond to the discretization and truncation of 
the shape function h{t). At a given m, Coo.riE) first decreases exponentially, followed by a much slower reduction with 
increasing P. The two-stage reduction of Coo.riE) is well understood for point forces [29]: the exponential decrease is 
due to the improved resolution of the shape function, and the slower reduction is associated with the Gaussian trun¬ 
cation from the shape parameter m. Therefore, at large P and m the result is expected to be accurate; indeed, in Fig. 2 
the minimum errors are all close to the machine precision. Such accuracy is inaccessible using the PME or the SPME 
method at this grid number (M = 64) due to the inherent coupling between the interpolation and the wave-space 
truncation errors. Moreover, for a given P, Coo.riE) first decreases to a minimum and then increases with increasing m. 
At the minimum, Coo.riE) is transitioning from exponential to slower decay, and the errors from the shape resolution is 
about the same as the errors from the Gaussian truncation. From the error estimation of Findbo & Tornberg [29, 30], 
at a given P, the minimum wave-space error Coo.riE) and the corresponding shape parameter m are 

Coc.riE) ~ exp(-P7r/2) and m ~ VttP, (79) 

respectively. The asymptotic exponential decay of the minimum Coa.riE) is also shown as dash-dotted lines in Fig. 2. 
The exponential decay of the minimum error with respect to P to the round-off precision at large P and m clearly 
demonstrate the spectral accuracy [82] of the SE method. 

In Fig. 2 different particle size incorporation approaches exhibit similar qualitative behaviors with quantitative 
differences. For example, to achieve an accuracy of Coo.riE) ~ 10“"^ at the optimal m, in Fig. 2a, 2b, and 2c the required 
P are respectively 15, 13, and 9, corresponding to the real-space, hybrid, and wave-space approaches discussed in 
Sec. 3.2. The latter two approaches reduce the h{t) evaluations by 35% and 78% compared to the real-space approach 
at a cost of the number of required FFTs. Therefore, there is a subtle balance between the number of interpolation 
points P and the number of FFTs in the SE method implementation. The hybrid approach in Fig. 2b achieves a good 
balance between accuracy and computation efficiency, and therefore is adopted in SEASD. 

Finally, Fig. 2 shows that, in addition to the spectral accuracy and the ease of implementation, the SE method also 
allows flexible error control by adjusting P and m without changing the grid points M. As a result, the errors from 
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the wave-space summation and the interpolation can be separated, and this permits more flexible error control when 
computing His in polydisperse systems. On the other hand, such error separation is not possible in other particle mesh 
techniques such as the PME and the SPME methods. 

5.1.2. Overall mobility accuracy 

Both the wave-space and the real-space computations affect the overall mobility accuracy, and the controlling 
parameters are the grid point M, the interpolation point P, the Gaussian shape parameter m, the real-space cutoff radius 
rc, and the splitting parameter Out of the five parameters, only changes in ^ and m do not affect the computational 
cost since adjusting M affects the EET size, changing influences the neighbor search, etc. With fixed computation 
cost, i.e., fixed M, P, and it is desirable to And the combination of m and ^ that minimizes the overall error. 

Eig. 3 and 4 present the effects of m and ^ on the overall mobility accuracy with various P and for M = 64 and 32, 
respectively. The wave-space computation uses the hybrid approach in Sec. 3.2, and the simulation box is orthogonal 
(y = 0). The thick black lines in these figures indicate the theoretical optimal shape parameter m - VttP [29, 30]. 
Note that in our implementation, the cutoff radius depends on the radius a,- and aj in a particle pair. 

Eig. 3i with M - 64, P = 21, and = 6(a, -H aj) best illustrates the influences of m and Here, the mobility 
computation can reach Coo.riE) < 10*® at {^a\,m) - (0.46,8). With fixed m, Coo.riE) exhibits a minimum with 
increasing ^ai, and when m < 8, the minimum degenerates to a plateau due to the wave-space Gaussian truncation, 
which is also illustrated in Eig. 2 at low m. At low the overall error is dominated by the real-space error, which 
decreases with increasing At high f, the overall error is mainly from the wave space, and increases with increasing 

With fixed ^ on the other hand, Coa.riE) also shows a minimum with increasing m. When ^ai < 0.46, the Coo.riE) 
minimum becomes a plateau since the real-space error is independent of m. Here, the reduction of Coo.riE) with 
increasing m at small m comes almost entirely from the reduced Gaussian truncation. When ^ai > 0.46, the minimum 
plateau disappears as in this region the wave-space error is sensitive to m, a point also illustrated in Eig. 2. Eurthermore, 
in Eig. 3i there is a region of Cca.riE) > 1 at high ^ and low m due to large wave-space errors. 

Comparison across rows and columns in Eig. 3 and 4 reveals the influences of and P on the overall accuracy, 
respectively. Eor both cases, reducing or P increases the minimum value of Cca.riE) and changes the corresponding 
^ai and m. Comparing Eig. 3g, 3h, and 3i shows that reducing rc increases the real-space error and shifting the 
minimum of Coo.riE) towards larger i. The decrease of Coo.riE) with respect to increasing ^ at small ^a\ also becomes 
slower. In Eig. 3g, the Coo.riE) minimum is at ^ai > 1. Comparing Eig. 3i, 3f, and 3c reveals the effects of reducing 
the interpolation point P. With diminishing P, the wave-space error increases due to poor Gaussian resolution, and 
the Coo.riE) minimum is shifted towards lower m. In addition, the overall accuracy decreases significantly for large m 
at small P, e.g., in Eig. 3c, Coo.riE) > 1 when m > 8. 

Comparing Eig. 3 and 4 shows the effect of grid point M on the mobility accuracy. Note that the color scales in 
Eig. 3 and 4 are different, and the minimum Coo.riE) in Eig. 3f and 4f is approximately the same. The most apparent 
effect of reducing M is the shrinkage of the parameter space corresponding to Coo.riE) < 1 due to the truncation 
of the wave-space sum. As a result, at M = 32, the mobility evaluation is more sensitive to ^ai compared to the 
case of M - 64. Otherwise, the qualitative aspects of Eig. 4 are similar to Eig. 3. Moreover, the thick black lines 
representing the theoretical optimal shape parameter m = VttP is almost always in the vicinity of the regions of the 
highest accuracy in both Eig. 3 and 4. This substantially simplifies the search for the optimal 

The influences of the particle number N on the overall mobility accuracy is presented in Eig. 5 for M - 32 and 
64. The simulation box size is fixed at Ljax - 23.5 in Eig. 5a, and the suspension volume fraction is fixed at = 0.05 
in Eig. 5b. Other parameters remain unchanged from the baseline case, and the mobility computation parameters are 
P - \3, m - 6.7, and = 4(a, + aj). The mobility accuracy is more sensitive to changes in L than changes in 
(p. In Eig. 5a, Coo.riE) changes little, but in Eig. 5b, the Coo.riE) minimum increases drastically with different N. The 
almost identical decrease in Cco.riE) at small ^a\ suggests the real-space error are not significantly changed by N in 
either case. The diverging Coo.riE) at higher ^a\ in Eig. 5b suggests the wave-space computation is sensitive to the 
box size at fixed P and m. This is well-known for particle mesh techniques in general [29, 67]. Therefore, to retain 
the computational accuracy with larger systems at the same volume fraction, it is necessary to increase the grid point 
M or the interpolation point P. Einally, we note in passing that the same qualitative error behaviors are found in the 
pressure moment computations. 
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Figure 3: (Color online) The overall accuracy measured in eoo^riE,) as a function of the splitting parameter ^a\ and the shape parameter m at M = 64 
for a real-space cutolf radius = 2(fl/ + aj) (left column), 4(fl/ + aj) (middle column), and 6(£?/ + aj) (right column), and the interpolation point 
P = 9 (top row), 15 (middle row), and 21 (bottom row). The thick black lines represent m = V^- The simulation cell is orthogonal (y = 0), and 
the particle size effects are accounted using the hybrid approach. 
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Figure 4: (Color online) The overall accuracy measured in eoo,r{^) as a function of the splitting pai'ameter ^a\ and the shape pai'ameter m with 
M = 32 for a real-space cutolf radius rc = 2(aj + aj) (left column), A{ai + aj) (middle column), and 6(fl, + <3y) (right column), and the interpolation 
point P = 9 (top row) and 15 (bottom row). The thick black lines represent m = yfnP. The simulation cell is orthogonal (y = 0), and the particle 
size effects are accounted using the hybrid approach. 



Figure 5: (Color online) The overall mobility accuracy measured in eoo,r{E) as a function of the splitting parameter ^ with N = 50, 100, and 200, 
and M = 32 (filled symbols) and 64 (open symbols) for (a): constant box size L/ay = 23.5 and (b): constant volume fraction ^ = 0.05. Changes 
ai'e based on the baseline case in Sec. 5.1. Other parameters are P = 13, m = 6.7, and rc = 4{ai + aj). 
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Figure 6: The accuracy of GPGPU mobility computation measured in e„,r(£)- (a): the wave-space accuracy as a function of P for various m with 
the same parameters in Fig. 2b. The GPU results are shown in black lines, and the CPU results in Fig. 2b are reproduced in gray lines. The values 
of m are annotated in the figure. The solid and dashed lines represent the case of y = 0 and 0.5, respectively, (b): The overall mobility accuracy 
from the GPU (solid lines) and the CPU (dashed lines) computations as a function fai with = 4(a, + aj) and m = The con'esponding M 

and P are annotated in the figure. 


5.2. Accuracy of the GPGPU implementation 

The accuracy of mobility computation using GPGPU programming discussed in Sec. 3.4 is presented in Fig. 6. 
Clearly, the GPU computations provide sufficient accuracy for dynamic simulations. Fig. 6a shows the GPU wave- 
space accuracy as a function of the interpolation point P for various shape parameters m for orthogonal (j - 0) and 
sheared (y - 0.5) simulation boxes. Here, the particle size effects are incorporated using the hybrid approach in 
Sec. 3.2, and the SE method parameters are identical to those of Fig. 2b. Moreover, for comparison the data in Fig. 2b 
are reproduced in gray. In Fig. 6a, the GPU results in black lines are indistinguishable from the CPU results in gray 
lines when eoo,r{E) > 10^^ for all m and y, indicating that the GPU computations are only limited by the SP arithmetics. 
When the error eoc,r{E) reaches 10“^, increasing the interpolation point P does not improve the computation accuracy 
on GPUs, while the error in the CPU computations using DP arithmetics continue to decrease until eoo,r{E) ~ 10 
In addition, the wave-space error remain eooj{E) ~ 10“^ after reaching the SP limit even with further increase in P, 
i.e., increasing P does not adversely affect the wave-space accuracy. 

The overall GPU mobility accuracy as a function of fai is presented in Fig. 6b for two M and P combinations 
with m - VttP and = 4(a, + aj) in orthogonal simulation boxes. The errors eoo,r(E) are computed using the baseline 
case of Sec. 5.1. The GPU results are shown in solid lines and the CPU results in dashed lines. When the overall 
error eoo,r{E) > 10“^, i.e., the case of (M, P) - (32,13) in Fig. 6b, the GPU and the CPU results are indistinguishable 
from each other. However, the differences are evident for the case of (M, P) = (64,21). When 0.5 < fai < 0.85, the 
GPU computations deviate from the CPU results with larger errors due to the SP arithmetics. Beyond this range, the 
CPU and the GPU results overlap again. In both cases, the accuracy achieved by the GPU mobility computation is 
sufficient for dynamic simulations, where the error tolerance is typically set at 10“^. The results in Fig. 6 dispel any 
concerns over the SP accuracy in the GPU mobility computations for dynamic simulations. 

5.3. Overall performance 

Fig. 7 presents the overall performance of various implementations of the SEASD and the conventional SD as a 
function of the system size N. The program performance is characterized by the wall time, i.e., the actual time of 
program execution, to march 100 steps in a dynamic simulation of Brownian suspensions at Pe = 1 starting from 
an equilibrium configuration. The suspension composition is T = 2, y 2 = 0.5, and f - 0.45. The SEASD mobility 
computation parameters are fixed at M = 32, P - 11, = 4(a, + aj) with appropriate f and m as they provide 

sufficient accuracy. The tolerance of the iterative solvers is set at 10“^. Eor SEASD the far-field Brownian forces 
are calculated using Eqs. (75) and (76) with Ar = 0.2, and for SEASD-nf the far-held diffusivities are from Table 1. 
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Figure 7: (Color online) The wall time (in second) of 100 time steps in dynamic simulations at Pe = 1 as a function of the particle number N using 
the conventional SD, SEASD, and SEASD-nf. The open symbols represent the CPU mobility computation and the hlled symbols the GPU mobility 
computation. The dashed line show the scaling, and the dash-dotted line show the 0(N\ogN) scaling. The suspension is bidisperse with 

/! = 2, y 2 = 0-5, and cj) = 0.45 starting from equilibrium configurations. 


The conventional SD result is from an efficient polydisperse implementation [11, 12, 83]. All the timing results are 
collected from a workstation with Intel i7-3770K CPU and NVIDIA GeForce GTX 680 GPU. 

Fig. 7 demonstrates the expected (9(V log V) asymptotic scaling of various SEASD implementations, highlighted 
by the dash-dotted line. The implementations with the CPU mobility computation are shown in open symbols and 
the GPU mobility computation in filled symbols. The GPU SEASD has almost the same time scaling as the CPU 
SEASD-nf at all the system size N. Both are almost an order of magnitude faster than the CPU SEASD at a typical 
system size N ^ 200. This clearly demonstrates the power and promise of GPGPU programming in the dynamic 
simulation of colloidal suspensions. More significant speedup is achieved by combining the mean-field Brownian 
approximation and the GPU mobility computation. In this case, the speedup of GPU SEASD-nf computation relative 
to the CPU SEASD ranges between 40 times for small systems and 15 times for large systems. We believe further 
speedup is still possible by optimizing the GPU implementation. With the speedup shown in Pig. 7, we are able 
to study dynamics of larger systems at longer times. In addition, compared to the conventional SD, all the SEASD 
implementations are faster at large enough N due to their favorable scaling. Here, the conventional SD scales as 
highlighted by the dashed line in Pig. 7. This peculiar scaling is a combined effect of the pairwise grand 
mobility tensor construction and explicit matrix inversion. At V » 1000, the scaling should recover (9(A^). In Pig. 7, 
the break-even between the CPU SEASD and SD is V = 216, and for GPU SEASD at A ^ 40. At all the system sizes 
studied here, the GPU SEASD-nf is always faster than the conventional SD. 

6. Static and dynamic simulation results 

6.1. Short-time transport properties 

In this section we present static SEASD simulation results on the short-time transport properties of monodisperse 
and bidisperse hard-sphere suspensions. With the iterative computation scheme in Sec. 4.1, the short-time translational 
and rotational self-diffusivities, instantaneous sedimentation velocities, and high-frequency dynamic shear and bulk 
viscosities can be straightforwardly evaluated. Other transport properties can also be calculated with an appropriate 
computation scheme. 

The suspension short-time limit refers to a time scale t satisfying t/ « f tq, where t/ is the inertial time and tq 
is the diffusion time. The inertia time t/ = ^p^a^lrja, where Pp and are the characteristic particle density and radius, 
describes the time required for the particle momentum to dissipate by interacting with the solvent. When r/ t, the 
particle momentum dissipates almost instantaneously and the particle dynamics are completely overdamped. The 
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Table 1: The polynomial coefficient fitted from the far-field diffusivities in Fig. 9. The data is for polydisperse suspensions with A = 2 and y 2 = 0.5. 
The fai'-field self-diffusivity can be expressed as Ido = 1 + C[(f) + C24>^ + where do is the single particle diffusivity. 



ds.x 

^s.2 

d,A 

jr,ti 

^s.2 

Cl 

-1.27 

-1.70 

-0.207 

-0.538 

C2 

0.536 

1.005 

-0.131 

-0.312 

C3 

-0.018 

-0.12 

-0.091 

0.19 


diffusion time tq - 6nriQaplksT characterizes the time scale of suspension configuration change and t To ensures 
that the transport properties entirely arise from the (instantaneous) His. Therefore, they are only determined by 
the configuration X, and can be calculated by sampling independent but equivalent configurations. In this work 
we use the Monte-Carlo procedure of Wang & Brady [11]; the hard-sphere configurations are first generated by an 
event-driven Lubachesky-Stillinger algorithm [84, 85], followed by a short equilibration. The transport properties are 
then computed statically. Here we compare the results from the SEASD with CPU mobility computation with our 
recent conventional SD results [11]. Although SEASD and SD are based on the same formalism, the grand mobility 
tensor 21t constructed from SD includes an additional mean-field quadrupole term [63], which can have quantitative 
consequences. Eor bidisperse hard-sphere suspensions, we focus on the composition with 4 = 2 and y 2 - 0.5. In the 
SEASD computations, the system size is A = 800, and the results are averaged over 500 independent configurations. 
Note that for simple cubic array of monodisperse particles, SEASD produces identical results as those of Sierou & 
Brady [36]. 

6.1.1. Short-time translational and rotational self-diffusivities 

The microscopic definition of the short-time translational and rotational self-diffusivities, c/' ^ and d''^ ^ respectively, 
for homogeneous suspensions are. 




(80) 


where ^ is a vector of unit length for the averaging process and ffl and fff are respectively the diagonal blocks of the 
force-linear velocity and torque-angular velocity couplings in Note that / 6 a in Eq. (80) suggests the summation 

is restricted to particles of species a. The diffusivities are computed using the matrix-free approach of Sierou & 
Brady [36]: the velocity disturbance "iff corresponding to a stochastic external force satisfying {ff^) - 0 and 
= -T is evaluated. It is straightforward to show that the ensemble average allowing 

extraction of the diffusivities in Eq. (80). 

The computed short-time translational self-diffusivities d[^ exhibit a strong A^03 dependence due to the 
periodic boundary conditions. The size dependence from an A-particle system can be eliminated by adding the 
following quantity to the results. 


Aatc/' 


1.76fifQi rjolfy 

(xi + X2A^A 


( 81 ) 


where d'^ j = k^Tj{bnrfQax) is Stokes-Einstein-Sutherland diffusivity for species 1, and tJs is the high-frequency dy¬ 
namic shear viscosity from the same configurations. The shear viscosity exhibits little size dependence, and can be 
directly used. The effectiveness of Eq. (81) has been demonstrated by Wang & Brady [11] in the wave-number- 
dependent hydrodynamic functions. The results here always contain this finite size N correction. 

Eig. 8a and Eig. 8b respectively present d[ ^jd^ ^ and ^jd^ ^ of monodisperse and bidisperse suspensions, where 
the single particle translational and rotational self-diffusivities are d^^ - k^TI{6nT]Qaa) and d'^^^ - k^Tfi^nriQal). 
The SEASD results, shown in symbols, agree well with the conventional SD results shown in lines. As expected, 
both dl ^ and d[ ^ decrease with increasing volume fraction and for bidisperse suspensions, the small particles 
show diffusivity enhancement while the large particles exhibit diffusivity supression. Compared to d[ q., „ are less 

sensitive to the volume fractions f, but more sensitive to the particle sizes d. The SEASD results for large particles 
show larger error bars compared to the SD results [11], most likely due to the stochastic computation procedure. 
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Figure 8: (Color online) The species short-time (a): translational and (b): rotational self-dilfusivities, and respectively, as a function of 
the total volume fraction ^ for monodisperse and bidisperse hard-sphere suspensions with A = 2, y 2 = 0.5. The results are scaled with the single 
particle translation and rotational dilfusivity, d^^ and df^^, respectively. The SEASD results are shown in symbols and the conventional SD results 
from Wang & Brady [11] are shown as lines. 





Figure 9: (Color online) The species far-held short-time translational and rotational self-dift'usivities, d^la respectively, as a function of the 

total volume fraction 0 for bidisperse hard-sphere suspensions with A = 2 and y 2 = 0.5. The results scaled with the single particle translation and 
rotational dilfusivity, d^ ^ and dg respectively. The symbols are the computation results, and the dashed and the dash-dotted lines are polynomial 
httings for the small and the large particles, respectively. 
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Figure 10: (Color online) The scaled species instantaneous sedimentation velocity, Us^alUo^a^ as a function of the total volume fraction (p for 
monodisperse and bidisperse hard-sphere suspensions with A = 2 and y 2 = 0.5. The single particle sedimentation velocity is The SEASD 
results are shown in symbols and the conventional SD results from Wang & Brady [11] are shown as lines. 


We also calculated the far-held short-time translational and rotational self-diffusivities and where “ff” 
suggests only the far-held His without the lubrication corrections are considered. They are the input for subsequent 
SEASD-nf computations in Sec. 6.2 and 6.3. The size dependency in the far-held translational diffusivity 

is corrected using Eq. (81) with the corresponding far-held viscosity. Eig. 9 shows and c/'’® for bidisperse 
suspensions up to = 0.62. Compared to Eig. 8, the far-held diffusivities exhibit weaker volume fraction dependence, 
and they do not have sharp reductions at high volume fractions. Consistent with Eig. 8, d^^ also exhibits stronger 
particle size dependence compared to its translational counterpart. In general, the (ft dependence of any scaled far-held 
diffusivity d^ jdo, with c/q the corresponding single-particle data, can be adequately captured by a cubic polynomial 
d^/d{) - 1 H- Ci(p + C 2 <p^ + cj,<p^, where the coefficients c,, i 6 {1,2,3), only depend on the suspension composition. The 
htting coefficients for bidisperse suspensions with A - 2 and y 2 - 0.5 are presented Table 1. The polynomial httings, 
also shown in Eig. 9 in dashed and dash-dotted lines for the small and the large particles, respectively, indeed describe 
the computation data. Not shown in Eig. 9 are the SEASD far-held diffusivities for monodisperse suspensions, which 
are identical to those of Banchio & Brady [39]. 

6.1.2. Instantaneous sedimentation velocity 

The species instantaneous sedimentation velocities Us,a are computed by applying a uniform external force Fa to 
each species. Eor bidisperse suspensions, the sedimentation velocity Us,a also depends on the species density ratio [8], 
y - Ap 2 /Api, with Apo, - Pa - po the density difference of species a. The species force ratio satishes F 2 IF 1 - yA^, 
and here we set y = 1 to facilitate comparison with earlier results. To eliminate the A size dependence, the 
following corrections are added to the results: 




^nUs,2 - 


1 - 761 / 0,1 m 
{xi + X2A^A 7 s 
1 - 761 / 0,1 m 
{xi + X2A^A 7 s 



5ii(0) + tV 


.[^SniO) 

V Xi 


,/^52l(0) + dV22(0) 

V -^2 


(82) 

(83) 


where Uo,a - Fal(67TT]oaa) is the single particle sedimentation velocity and 5 q^( 0) is the partial static structural factors 
in the zero wave number limit. Eqs. (82) and (83) are based on the hnite-size correction for partial hydrodynamic 
functions [11]. Here, the partial static structural factors are computed from the poly disperse Percus-Yevic integral 
equations [86-89]. 
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Figure 11: (Color online) The high-frequency dynamic (a): shear viscosity t]s and (b): bulk viscosity Ks as functions of the total volume fraction (p 
for monodisperse and bidisperse hard-sphere suspensions with A = 2 and y 2 = 0.5. The results are scaled with the solvent viscosity and only 
the particle contributions, “ 1 and - /^o)/^o are presented. The SEASD results are shown as symbols and the conventional SD results [11] 
ai‘e shown as lines. 


Fig. 10 presents the SEASD Us,alUo,a in symbols, which are not the identical to the conventional SD results shown 
in lines. The difference is especially pronounced at high volume fractions. For monodisperse suspensions, the SEASD 
and the conventional SD agree with each other satisfactorily up to 0 a; 0.3, and at higher (p, the SEASD results become 
significantly higher. This difference is from the mean-field quadrupole term, which is absent in SEASD. Despite the 
quantitative differences, the SEASD monodisperse sedimentation velocity remain positive and physical. A similar 
overestimation of the sedimentation velocity is also found when comparing ASD results [36] and the conventional SD 
results [63] for simple cubic arrays. 

The differences between the SEASD and the conventional SD results are more significant for bidisperse suspen¬ 
sions. For Us,\ of the small particles, the differences are not evident until cp - 0.3, and for Us ,2 of the large particles, 
the differences are obvious even at ^ 0.2. Moreover, Us ,2 exhibits a minimum and increases with (p at higher volume 
fraction, leading to a crossing of Us,\ and Us ,2 at = 0.45. These unphysical behaviors are caused by inaccurate 
HI computations at the stresslet-strain rate level. Apparently, the His of the large particles, which are surrounded by 
many small particles, are more complicated than those of the small particles and more difficult to capture accurately. 
Note that for sedimentation the lubrication interactions are not important and one must rely on the far-held mobility 
for all His. 

Fig. 10 also illustrates that sedimentation problems in dense bidisperse suspensions, even at T = 2, is challenging 
for SEASD. Incorporating the mean-held quadrupole term [63], (1 - ^(p), in the grand mobility tensor can signihcantly 
improve the results [11]. However, such incorporation is not carried out in this work. 

6.1.3. High-frequency dynamic shear and bulk viscosities 

The high-frequency dynamic shear and bulk viscosities, and Ks, are respectively dehned as, 

77, = 7/0 -I- n{S^)^yly, and Ks^kq + |77<S'^> ; lie, (84) 

where j is the imposed strain rate, e is the imposed uniform expansion rate, is the hydrodynamic stresslet in 
Eq. (65), and the subscript xy denotes the velocity-velocity gradient component. They are directly computed from 
SEASD and exhibit little size dependencies. Experimentally, 77 , and Ks are measured by imposing high-frequency, 
low-amplitude deformations, such that the suspension microstructures are only slightly perturbed, and the Brownian 
stress contributions are out of phase with the applied deformations [90]. 

Fig. 11a and 11b present the volume fraction (p dependency of the particle contributions to the high-frequency 
dynamic shear and bulk viscosities, 77^/770 - 1 and (a-, - Ko)lqo, respectively. The SEASD calculations are shown in 
symbols, and the corresponding conventional SD results are shown in lines. For 77 ,, the SEASD and the conventional 
SD results agree well over the entire f range. The results for monodisperse and bidisperse suspensions with A- 2 are 
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Figure 12: (Color online) The equilibrium osmotic pressure Il/(nk^T) of monodisperse and bidisperse Brownian suspensions with A = 2 and 
y 2 = 0.5, as a function of volume fraction (p. The dashed line represents the CS equation of state, Eq. (86), and the dash-dotted line represents the 
BMCSL equation of state, Eq. (87). 


almost identical when (p < 0.55. At higher volume fractions, the monodisperse rjs are more sensitive to <p compared 
to the bidisperse results, as introducing particles of difference sizes significantly alters the suspension hydrodynamic 
environment in this limit. Unlike sedimentation, for the shear viscosity lubrication interactions are important and 
dominate the behavior at high (p. 

For the high-frequency dynamics bulk viscosity Ks in Fig. 1 lb, the SEASD and conventional SD results show 
qualitative agreement with noticeable quantitative differences at moderate (p: the SEASD results are higher and less 
sensitive to the particle size ratio A. The differences are caused by different pressure moment computation procedures. 
Recall that the far-held grand mobility tensor DJI is not symmetric by construction, and the symmetry of 971 ' must 
be restored for subsequent calculations. This is done in conventional SD by explicit copy of matrix elements after 
the matrix inversion [91]. This is not applicable for the matrix-free computation of DJI in SEASD. Here, the pressure 
moment is computed from the far-held forces and stresses. Eig. 1 lb shows that the two conceptually equivalent 
approaches do lead to small quantitative differences. Moreover, for dense suspensions, such differences are masked 
by the dominance of lubrication interactions. Therefore, the SEASD and the conventional SD results agree well at 
low and high (p. Near the close packing limit, Ks for bidisperse suspensions is signihcantly lower than that of the 
monodisperse case, since the particle size polydispersity improves the particle packing. 

6.2. Equilibrium suspensions 

Here we present the dynamic simulation results with SEASD and SEASD-nf for monodisperse and bidisperse 
Brownian suspensions at zero Peclet number. In particular, we are interested in the following equilibrium prop¬ 
erties: the osmotic pressure H, the high-frequency dynamic bulk modulus K'^, and high-frequency dynamic shear 
modulus, G'^. The dynamic simulations are carried out with 100 particles over 200 diffusive time units with a time 
step Afc/pj/aj = 10^^. The mobility computation in SEASD is performed on GPUs with M = 32, P = 11, and 
rc = 4(a, -I- Gj), and the far-held Brownian force is calculated using the IVP method in Sec. 4.3 with Ar = 0.1. The 
tolerance for the iterative solver is 10“^ and the tolerance for matrix inversion in Eqs. (75) and (76) is 0.02. The com¬ 
position of bidisperse suspensions are d = 2 and y 2 = 0.5. Therefore, for the SEASD-nf computations the coefficients 
in Table 1 are used. Note that with Pe = 0, SEASD-nf computations do not contain far-held mobility evaluations. 

6.2.1. Osmotic pressure 

The osmotic pressure of an equilibrium suspension is dehned as 

n = ukBT-:/, (85) 
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Figure 13: (Color online) The high-frequency dynamic moduli: (a) the bulk modulus K'^c?^l(k^T), and (b) the shear modulus G'^c?^l(k^T), as 
functions of volume fraction 0 for equilibrium monodisperse and bidisperse Brownian suspensions with A = 2 and y 2 = 0.5. The results are 
computed from SEASD (filled symbols) and SEASD-nf (open symbols). 


where (S®) is the Brownian stresslet in Eq. (67). For rigid particles with no-slip boundary conditions, Brady [51] 
showed that the osmotic pressure is purely hydrodynamic in origin, and is identical to that of a hard-sphere fluid. The 
osmotic pressure of monodisperse suspensions is well described by the Carnahan-Starling (CS) equation up to the 
fluid-solid transition, 

n _ i + + 

nk^T ( 1 - 0 )^ 

The CS equation of state is extended to polydisperse suspensions as the Boublik-Mansoori-Carnahan-Starling-Leland 
(BMCSL) equation [92]; 

n + - 3<^(zi H- Z24>) - Z3<p^ 

nk^T~ (1-0)3 ’ 

wherezi = An{l + A) / ^|A, zi = Ai 2 (yi/lH-y 2 )/VI, andzs = [(y3xi)‘/3+ (^2^2)'^3]3 ^ ^fx^{A-if! A. 

Fig. 12 presents the equilibrium osmotic pressure of monodisperse and bidisperse suspensions with A - 2 and 
y 2 = 0.5 as functions of 0 using SEASD and SEASD-nf computations. The CS [Eq. (86)] and the BMCSL [Eq. (87)] 
equations of state at the corresponding bidisperse compositions are respectively shown in dashed and dash-dotted 
lines. Also shown in Fig. 12 are the static computation results with N = 200, denoted “static”. The static computations 
do not consider particle dynamics, and calculate the osmotic pressure by taking a full Brownian step from independent 
particle configurations in a Monte-Carlo fashion. In Fig. 12, at each volume fraction 500 independent configurations 
are used in the static computations. 

The osmotic pressures from the SEASD, the SEASD-nf, and the static computations agree with the CS and BM¬ 
CSL predictions in Fig. 12. The static computations show the best agreement over the entire 0 range, and this directly 
validates the Brownian stress computation method in Sec. 4.3. The dynamic SEASD results are slightly higher than 
the theoretical predictions because the configuration evolution is affected by the finite At in the far-field Brownian 
force computation. The slight difference does not invalidate this approach as it is well within the discretization errors 
of Eqs. (75) and (76). Note that, as long as the tolerances for the iterative solution of Eqs. (75) and (76) are smaller 
than the discretization step size At, the principal source of error is the time discretization. We have verified that reduc¬ 
ing the iterative solver tolerance with fixed At does not improve the results. Finally, the agreement in the bidisperse 
osmotic pressures from SEASD-nf and the BMCSL equation validates the extension of the mean-field Brownian ap¬ 
proximation to polydisperse systems. The SEASD-nf results are only slightly lower than the theoretical predictions, 
which is acceptable considering the substantial speedup offered by this approach. 
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6.2.2. High-frequency dynamic moduli 

The suspension high-frequency dynamic bulk and shear moduli, K'^ and G'^ respectively, can be computed from 
the short-time limit of the pressure-pressure and stress-stress autocorrelation functions [91, 93, 94], i.e., 

K'^ = bin ^<OT(f)OT(0)), and G’^ = liin ^{cr(t)crm, (88) 

t^Q IcbT t^o IcbT 

where dlf is the osmotic pressure fluctuations and cr is the off-diagonal components of the bulk stress (Z) in Eq. (64). 
Note that the viscoelasticity of colloidal suspensions is entirely of hydrodynamic origin, and without His, e.g., in 
hard-sphere fluids, these moduli are infinite. 

Fig. 13a and 13b respectively present K'^ and G'^ of monodisperse and bidisperse suspensions as functions of 
(j) from the same SEASD and SEASD-nf dynamic simulations of Fig. 12. Both K'^ and G'^ grow rapidly with (p, 
and at the same volume fraction, the monodisperse moduli are always higher. In Fig. 13a, the bulk modulus K'^ 
computed from SEASD and SEASD-nf share the same qualitative behavior. However, the SEASD results are almost 
always higher than the SEASD-nf results except at small f, and their differences grow with increasing f. This 
is consistent with the growing differences in H with increasing f in Fig. 12. On the other hand, in Fig. 13b the 
differences in the shear modulus G'^ between the SEASD and the SEASD-nf results decrease with increasing f, with 
the SEASD-nf data higher at low volume fractions. Note that the bidisperse SEASD results show large fluctuations 
when (p - 0.2 ~ 0.25, most likely due to the small number of large particles at A = 100 and the particular particle 
spacing at this volume fraction. Finally, small differences in fluctuation quantities such as K'^ and G'^ are expected 
for SEASD and SEASD-nf because the Brownian stresses are computed differently. However, more importantly, the 
same qualitative behaviors are followed in both methods. 

6.3. Rheology of bidisperse suspensions 

The final validation of SEASD and SEASD-nf is the steady shear rheology of Brownian suspensions at constant 
strain rate. Both monodisperse and bidisperse hard-spehre suspensions are considered; the volume fractions are fixed 
at 0 = 0.45 in both cases, and the bidisperse composition is T = 2 and y 2 = 0.5. The results are extracted from SEASD 
and SEASD-nf simulations with GPU mobility computation over a wide range of Peclet number Pe = 6nqoa^^yl(ksT). 
Moreover, we introduce a small excluded volume on each particle to emulate the effects of surface asperities or 
polymer coating and to prevent particle overlap. It is characterized by, 

b = 1 - ailbi, (89) 

where bi is the excluded volume radius for each particle. The SEASD and SEASD-nf simulations are carried out at 
b = 5 X 10““^ with N = 200 over 150 dimensionless time units with a step size 10“^. Other simulation parameters 
are similar to those in Sec. 6.2. The data are averaged in segments after the steady state is reached, usually after 
20 dimensionless time units. As is customary, the .r-direction is the velocity direction, the y-direction is the velocity 
gradient direction, and the z-direction is the vorticity direction. 

6.3.1. Shear viscosity 

Fig. 14a and 14b respectively present the Brownian viscosity and the flow viscosity as functions of the 
Peclet number. These viscosities are defined as 

qB ^ n{S^},yly and q^ = n{S\yly, (90) 

with (S®) in Eq. (67) and (S^) in Eq. (65). In this figure, the monodisperse data are shown in squares and the 
bidisperse data in triangles, with the SEASD results in filled symbols and the SEASD-nf results in open symbols. 
For comparison, the SD results of Foss 8c Brady [72] for monodisperse suspensions are presented in open circles. To 
clarify the effects of the excluded volume parameter b on viscosities, another set of monodisperse SD simulations with 
A = 30 are performed at b = 5 x 10 and 10“^, and the results are shown as crosses and pluses respectively. In all 
cases, the stress contributions from inter-particle forces are negligible, and therefore are not presented. 

In Fig. 14 both the Brownian viscosity q^ and the flow viscosity q^ exhibit the expected behaviors; with in¬ 
creasing Pe, qB decreases (shear-thinning) and q^ grows (shear-thickening). In addition, there are several important 
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Figure 14: (Color online) Different viscosity contributions to the rheology of monodisperse and bidisperse hard-sphere suspensions: (a) the 
Brownian viscosity r/^/rjo and (b) the flow viscosity T]^/riQ, as functions of Pe. The volume fraction (ft = 0.45 in both cases, and the bidisperse 
composition is i = 2 and y 2 = 0.5. 


observations. First of all, the excluded volume parameter 6 introduces quantitative effects on the suspension rheology, 
especially at high Pe. Comparing the SD results with d = 5 x 10 and 10“^, increasing d enhances the shear-thinning 
of Tj^ and weakens the shear-thickening of ij^, especially at high Pe. At low Pe, the effect of d is almost unnoticeable. 
The SD results at b = 10“^ agree well with those of Foss &. Brady [72], and the results at b = 5 x 10 are consistent 
with the monodisperse SEASD and SEASD-nf results, with larger differences shown in . This difference is most 
likely due to the number of particles in the computations. Next, the bidisperse Brownian viscosity 77 ^ is always lower 
than the monodisperse value at all Pe, and for the flow viscosity their difference is most apparent at high Pe. The 
large difference in rj^ at high Pe suggests distinct His and structures between the monodisperse and the bidisperse 
suspensions, since Pig. 11a suggests 77 ^ is insensitive to equilibrium suspension structures at 0 = 0.45. Pinally, the 
SEASD and SEASD-nf results in Pig. 14 almost always overlap each other, showing that the mean-held Brownian 
approximation is valid over the entire Peclet number range. At high Pe, the Brownian viscosity 77 ® from SEASD shows 
larger Huctuations compared to the SEASD-nf results as the Brownian stresses are difhcult to compute with highly 
anisotropic structures. However, these Huctuations do not affect the overall viscosity since the Brownian contribution 
at high Pe is insignihcant. 

6.3.2. Non-equilibrium osmotic pressures 

Pig. 15a and 15b present the Brownian and the How contributions to the suspension osmotic pressure, 

H® = nkyiT - i 77 <S®) ; / and H^ = -| 77 <S'^> ; /, (91) 

respectively, as functions of Peclet number Pe. In these figures, the scaling for the Brownian contribution is nk^T 
and the scaling for the flow contribution H^ is 7707 . Similar to Pig. 15, the monodisperse data are presented in 
squares and the bidisperse data in triangles, with the SEASD results in filled symbols and SEASD-nf results in open 
symbols. Pig. 15 also presents the N -2>0 monodisperse SD results with 6 - 5x 10“"^ and 10“^ in crosses and pluses, 
respectively. Similarly to the shear stresses, the inter-particle contribution to the osmotic pressures is also negligible 
compared to the contributions from His. 

In Pig. 15, both jink^T) and n^/(y 77 o) grow with increasing Pe when Pe < 100. The Brownian contribution 
Unkr^T) asymptotes the equilibrium value as Pe ^ 0. At higher Pe, the influence of the excluded volume param¬ 
eter S becomes apparent. Por the Brownian osmotic pressure contribution l{nk^T), the SD results at <5 = 10“^ 
continuously grow with Pe up to Pe = 10^, the highest value in our study, while with 6 - 5 x 10 a maximum 
in Unk^T) around Pe = 10^ is apparent. After the maximum, n^/( 77 kBr) decreases slowly with growing Pe. In 
this case, the parameter 6 not only brings quantitative, but also qualitative differences. On the other hand, the flow 
osmotic pressure contribution n^/(y 77 o) increases and reaches a plateau at high Pe. Comparing the SD results with 
d = 5 X 10 and 10“^, increasing 5 reduces the final plateau value of Hypo) at a smaller Pe. Apparently, the high 
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Figure 15: (Color online) Different contributions to the osmotic pressures of monodisperse and bidisperse hard-sphere suspensions: (a) the Brow¬ 
nian contribution scaled with nk^T, /{nk^T), and (h) the flow contribution scaled with 7]oy, Hyifo), as functions of Pe. The volume fraction 
is (^ = 0.45 in both cases, and the bidisperse composition is d = 2 and y 2 = 0.5. 


Pe osmotic pressure is very sensitive to the excluded volume parameter 6. In terms of the normal viscosity, i.e., If/y 
with n = -H n^, increasing 6 weakens the shear thickening of the normal viscosity. Furthermore, the SD results at 
5 - IQ-^ agree qualitatively with the results of Yurkovetsky & Morris [53], with slight quantitative difference due to 
different osmotic pressure computations. At d = 5 x 10 "^, the Brownian osmotic pressures 11* from SD and SEASD 
almost overlap each other in Fig. 15a, and D* from SEASD is lower than the SD results in Eig. 15b. Similarly to 
Eig. 14b, the difference is most likely due to the small system sizes in the SD computations. Moreover, the SEASD 
n* also exhibits larger error bars at high Pe due to the Brownian stress computation, but such errors are of little 
consequences on the suspension total osmotic pressures. 

Eor the bidisperse results shown in triangles in Pig. 15, the Brownian osmotic pressure fl* is always lower than 
its monodisperse counterpart, and the bidisperse 11* is first slightly higher than the monodisperse results at low Pe 
and then lower at high Pe. The crossing of the monodisperse and bidisperse 11* demonstrates the complex interplay 
between His and structures in polydisperse systems. 

The SEASD-nf results in Pig. 15 agree qualitatively with the SEASD computations. However, for H*, there are 
quantitative differences at both 4=1 and A - 2, with the SEASD-nf results systematically lower. This difference 
is inherently associated with the far-held Brownian force computations in Sec. 4.3 and the mean-held Brownian 
approximations, and is also encountered in Pig. 12. However, the quantitative discrepancies in H* are still within the 
discretization errors of At in Eqs. (75) and (76). On the other hand, for H*, the SEASD-nf and SEASD results almost 
always overlap each other over the entire Pe range for both bidisperse and monodisperse suspensions. SEASD-nf 
satisfactorily captures both contributions of the suspension osmotic pressures, H* and H*. 

6.3.3. Normal stress differences 

The hrst normal stress difference Ni and the second normal stress difference N 2 , dehned as 

Nl = {Zh, - {L)yy and N 2 = {L)yy - {L)^, (92) 

describe the stress anisotropy in sheared suspensions, and are important for understanding phenomena such as the 
shear-induced particle migrations [52]. The normal stress differences N\ and N 2 are respectively shown in Pig. 16a 
and Pig. 16b. The monodisperse data are shown in squares and the bidisperse data in triangles, with SEASD results 
in filled symbols and SEASD-nf results in open symbols. In addition, in Pig. 16, the SD results of Poss &. Brady [72] 
are presented in circles, and the SD computations at A = 30 with d = 5 x 10 "* and 10-^ are respectively shown in 
crosses and pluses. 

In general, the first normal stress difference Ni in Pig. 16a changes sign from positive to negative with increasing 
Pe, and the second normal stress N 2 in Pig. 16b remains negative for all Pe studied and exhibits weak Pe dependence. 
The data with small systems are strongly scattered, particularly at small Pe. Eor monodisperse suspensions, the 
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Figure 16: (Color online) The normal stress differences: (a) the first normal stress difference Ni and (b) the second normal stress difference N 2 as 
functions of Peclet number Pe. The volume fraction is 0 = 0.45 in both cases and the bidisperse composition is T = 2 and y 2 = 0.5. 


excluded volume parameter 6 has little effect on A^i or N 2 , as there lacks a qualitative difference for the SD results at 
d = 5 X 10 and 10“^ in Fig. 16. These SD results in general agree with the data of Foss & Brady [72] when Pe > 1. 
At smaller Pe, the data exhibit large errors due to fluctuations in Brownian stresses, making quantitative comparisons 
difficult. 

In Fig. 16 the SEASD results at d = 1 follow the SD data with the same qualitative behaviors. The differences 
at low Pe is likely associated with the difficulties in measuring the fluctuating Brownian normal stresses. In addition, 
the SEASD results show clearer trends at high Pe thanks to larger system sizes: both Ni and N 2 asymptote toward 
constant values with increasing Pe. Particle size polydispersity weakens the influences of Pe on the first normal stress 
difference Ni. In Pig. 16a, the bidisperse Ni are less sensitive to Pe compared to the monodisperse case, and as 
Pe —> 00 , the bidisperse Ni asymptotes towards a negative value with a smaller magnitude. On the other hand, the size 
polydispersity has little effect on the second normal stress N 2 , as the bidisperse N 2 almost overlaps the monodisperse 
N 2 , especially at large Pe. 

The SEASD-nf and the SEASD results agree satisfactorily when Pe > 10 for both the monodisperse and bidisperse 
suspensions. As expected, larger differences are found at low Pe, as the mean-field Brownian approximation in 
SEASD-nf explicitly removes the anisotropy in the far-field mobility tensor. However, the SEASD-nf results still 
capture the qualitative aspect of Ni and N 2 even in the low Pe limit. 

6.3.4. Species stress distribution 

Stress distributions across different species are key to understand the phenomena of particle migration and seg¬ 
regation in polydisperse suspensions [95], and are presently only accessible from simulations. Pig. 17 presents the 
stress distribution, expressed as the stress fraction taken up by the small particles (species 1), as functions of Pe for 
bidisperse suspensions with (ft - 0.45, A - 2, andy 2 = 0.5. Pig. 17a shows various shear stress fractions. In terms of 
the definitions in Eqs. (64)-(67), cri/cr (circles), cr^/cr* (squares), and cr^/cr^ (triangles) in Pig. 17a are 

0 - 1 / 0 - = and o-f/o-^ = (93) 

where (-/q. indicates averaging with respect to species a. Pig. 17b presents various normal stress fractions. The normal 
stress fractions S i/5 (circles), S f(squares), and (triangles) in Pig. 17b are similarly defined as 

5 1/5 = vi(/ : {L),)I{1 : {!)), - x,{l : <S»)i)/(/: {S^)), and = xi(/: {S^h)l{l: (S^)). (94) 

In both figures, the SEASD results are shown in filled symbols and the SEASD-nf results are shown in open symbols. 

Pig. 17a illustrates that the total shear stress is roughly equally partitioned between the two species, and the fraction 
0 - 1 / 0 - is almost constant with respect to Pe. This is largely because the flow shear stress fraction cr^lcr^ is insensitive 
to Pe. The Brownian shear stress fraction cr^ Icr^, on the other hand, exhibits weak Pe dependence: the ratio cr^lcr^ 
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Figure 17: (Color online) The fraction of stresses taken up by the small particles (species 1) in a bidisperse suspension: (a) the fraction of the shear 
stress and (b) the fraction of the normal stress. The stress fractions are shown as functions of Pe. The composition of the bidisperse hard-sphere 
suspension is (p = 0.45, T = 2, and y 2 = 0.5. 


increases with Pe from less than 0.45 at Pe = 0.1 to close to 0.6 at Pe = 100. At higher Pe, the Brownian stress fraction 
shows large fluctuations, also due to the difficulties associated with the anisotropic structures. However, in this limit, 
the Brownian contribution to the total stress is small, and the large fluctuations in Fig. 17a is inconsequential. On the 
other hand, the total normal stress fraction 5i/5 in Fig. 17b shows stronger Pe dependency, and it decreases from 
0.6 at Pe = 0.1 to 0.45 at Pe = 10^. Contrary to shear stress distributions in Fig. 17a, the Brownian normal stress 
distribution S^/S^ is almost constant at 0.6, but 5//5^ increases from 0.3 at Pe = 0.1 and asymptotes towards 0.45 
as Pe — > oo. Since the Brownian stresslet dominates at low Pe and the flow stresslet dominates at high Pe, the normal 
stress distributions in Fig. 17b are distinctively affected by both the flow and the Brownian contributions. Fig. 17 
demonstrates that both the shear and the normal stresses in bidisperse suspensions are distributed based on the species 
volume and the distribution weakly depends on Pe. This is a useful insight for modelling polydisperse systems. 

The stress distributions from SEASD-nf accurately capture the SEASD results except the Brownian shear stress 
distribution cr^/cr^ at high Pe in Pig. 17a, where the SEASD-nf results is slightly lower. This difference, however, is 
expected since the mean-field Brownian approximation ignores the structural anisotropy in the suspension. Moreover, 
the discrepancies are only evident at Peclet numbers where the Brownian stress does not affect the overall suspension 
rheology. Prom this perspective, the overall quality of the SEASD-nf approximation is deemed satisfactory. 

6.3.5. Long-time diffusion 

An important characterization of the overall suspension dynamics is the translational long-time self-diffusivities. 
The long-time limit refers to a time scale t » To, where, recall that. To - bnTjQa^lkffT is the single particle diffusive 
time scale. In this limit, the particle movement is diffusive due to extensive interactions with their neighbors. The 
corresponding diffusivities are obtained from the particle mean-square displacement. In the velocity gradient and the 
vorticity directions, these self-diffusivities are respectively defined as 

= lim 5 d((Ay)^)„/df and = lim \d{{^zf)a|dt, (95) 

f—>oo ^ >oo ^ 

where Ay and Az are the particle trajectory fluctuations in y- and z-directions. Pig. 18a and 1 8b respectively present the 
long-time diffusivities and d'^^a as functions of Peclet number. The monodisperse results are shown in squares. 
Por bidisperse suspensions, the small and the large particle long-time self-diffusivities are presented in triangles and 
circles, respectively. Por comparison. Pig. 18 also shows the results from Poss & Brady [72] in crosses. Moreover, 
the SEASD and the SEASD-nf results are shown in filled and open symbols, respectively. 

Por monodisperse suspensions in Pig. 18, both d‘^^ and d'^ grow with Pe due to the imposed shear flow, with 
the velocity direction diffusivity dff^ slightly higher. At low Pe, d^^ and di£^ grow weakly with Pe, and at large Pe, 
both diffusivities are proportional to Pe. The SEASD results is consistent with the SD results of Poss 8c Brady [72] at 
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Figure 18: (Color online) The species long-time self-diffusivities: (a) the velocity gradient direction diffusivity and (b) the vorticity direction 
diffusivity of monodisperse and bidisperse hard-sphere suspensions as functions of Pe. The volume fraction is (^ = 0.45 for both cases, and 
the bidisperse composition is T = 2 and y 2 = 0.5. 


intermediate Pe. The differences at large and small Pe are most likely due to the system size, as in this work N - 200 
while in Foss & Brady [72] N - 21. For bidisperse suspensions, the long-time self-diffusivities and d'^a for 
both species exhibit similar Pe dependencies as the monodisperse case. However, introducing a second species to the 
suspension apparently enhances the long-time self-diffusivities of both species, particularly at high Pe. This mutual 
diffusivity enhancement is in contrast to the short-time diffusivities in Fig. 8a, where at 0 = 0.45, the small particle 
diffusivity enhancement is always accompanied by the large particle diffusivity supression. Moreover, the diffusivity 
enhancement in y-direction is stronger than those in z-direction. 

In Fig. 18 the diffusivities from SEASD-nf in general agree with the SEASD results for both monodisperse and 
bidisperse suspensions. At low Pe, the SEASD-nf diffusivity is lower, particularly for the large particles. The agree¬ 
ment between SEASD and SEASD-nf improves with increasing Pe due to the reduced influences of Brownian motion. 


6.3.6. Suspension structures 

Einally, we examine the structures of sheared bidisperse suspensions via the projections of the partial pair- 
distribution functions gap{r), which are defined as the conditional probability of finding another particle in species 
f3 given a particle of species a, i.e., 


gap{r) = 



(96) 




They are related to the pair-distribution function g{r) through 


^ XaXfiga/i{r). 


a,p 


(97) 


Pig. 19, 20, and 21 present projections of g{r) and gapir) on the velocity-velocity gradient (xy-) plane, the velocity- 
vorticity (xz-) plane, and the velocity gradient-vorticity (yz-) plane, respectively, at selected Peclet numbers. These 
figures are based on particle trajectories from SEASD simulation, and are indistinguishable from the SEASD-nf 
results. 

Pig. 19 clearly displays the structural anisotropy caused by the shear flow in the xy-plane, characterized by the 
distortion of the otherwise isotropic pair-distribution rings. With increasing Pe, the overall pair-distribution function 
g{r) shows an accumulation of neighboring particles in the compressional quadrant. This is indicated by the brighten¬ 
ing and thinning of the rings at 2ai, ai + 02 , and 2 a 2 , corresponding to the particle pairs of two small particles, a large 
and a small particle, and two large particles, respectively. Meanwhile, the particle pairs are depleted in the extensional 
quadrant. 
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Figure 19: (Color online) The velocity-velocity gradient (Jty-) plane projection of the pair-distribution function g(r) and the partial pair-distribution 
functions gapif) at various Pe for bidisperse suspensions with (p = 0.45, A = 2, and y 2 = 0.5. 
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Figure 20: (Color online) The velocity-vorticity (jcz-) plane projection of the pair-distribution function g(r) and the partial pair-distribution functions 
gap{^) at various Pe for bidisperse suspensions with 0 = 0.45, A = 2, and yi = 0.5. 
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Figure 21: (Color online) The velocity gradient-vorticity (yz-) plane projection of the pair-distribution function g{r) and the partial pair-distribution 
functions ga^if') at various Pe for bidisperse suspensions with (p = 0.45, d = 2, and yi = 0.5. 


Specific changes in different types of particle pairs are revealed by examining the corresponding partial pair- 
distribution function gap{r) in Fig. 19. The distribution of the small-small particle pairs is presented in gii(r). Simi¬ 
larly to g{r), g 11 (r) is increasingly distorted and compressed in the compressional quadrant with increasing Pe, forming 
a boundary layer. At higher Pe, the pair structure remain approximately unchanged. In the extensional quadrant, the 
pair breakup point shifts from the extensional axis towards the velocity (x-) direction due to the lubrication interac¬ 
tions, with a clear tail of high probability outlining the trajectory of small-small pair disengagement. The distribution 
of the small-large particle pairs in gnir) shows a similar structural distortion in the compressional quadrant with 
increasing Pe. Moreover, in the extensional quadrant, the trajectory of particle disengagement is more diffusive com¬ 
pared to gii(r) at the same Pe. This suggests that particle movement in bidisperse suspensions are facilitated by the 
breakup of small-large particle pairs, and partially explains the mutual enhancement of long-time self-diffusivity in 
Fig. 18. For the distribution of large-large particle pairs, giiir) also exhibits anisotropy with increasing Pe in Fig. 19. 
However, due to the limited particle number, information beyond the first coordinate shell is difficult to analyze. 

Fig. 20 displays the total and partial pair-distribution function projections in the xz-plane. Unlike the xy-plane 
projections in Fig. 19 which exhibits strong anisotropy, the suspension structures here are less sensitive to Pe. With 
increasing Pe, the particles are compressed towards each other, which is evidenced by the thinning and brightening of 
the hrst coordinate shells. More interestingly, at higher Pe > 100, gnir) shows a belt of particle enrichment along the 
flow direction, while and g 22 {r) exhibit a corresponding particle depletion. This indicates that the small-large 

pairs are preferred in the xz-plane, and that the shear flow promotes species mixing in the flow direction. 

Fig. 21 shows the projection of g{r) and gap{r) in the yz-plane. With increasing Pe, the shear flow also compresses 
the particle pairs in this plane without apparent anisotropy. Note that even at Pe = lO'^, the suspension does not exhibit 
string ordering [96] due to the His. The lack of structural formation is also confirmed by the continuous increase of 
the long-time self-diffusivities and with Pe in Fig. 18. 

7. Conclusions 

In this work we presented the Spectral Ewald Accelerated Stokesian Dynamics (SEASD) for dynamic simulations 
of polydisperse colloidal suspensions. Using the framework of Stokesian Dynamics (SD), SEASD can accurately and 
rapidly compute His in dense polydisperse suspensions. Other features of SEASD include (0 direct inclusion of the 
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solvent compressibility and pressure evaluations; (//) the use of the Spectral Ewald (SE) method for accurate mobility 
computation with flexible error control; (Hi) a far-fleld preconditioner to accelerate the convergence of the nested iter¬ 
ative scheme; (iv) GPGPU accelerated mobility evaluation for almost an order of magnitude speed improvement; and 
(v) the incorporation of SEASD-nf, an extension of the mean-field Brownian approximation of Banchio & Brady [39] 
to polydisperse suspensions. 

We extensively discussed the accuracy of mobility computation using the SE method, established the baseline for 
parameter selection, and demonstrated the adequate accuracy in the GPU single precision (SP) mobility computa¬ 
tion. We found that compared to the full SEASD computations, SEASD-nf can achieve significant speedup without 
substantially sacrificing accuracy. Indeed, for all the dynamic simulations in this work, the SEASD and SEASD-nf 
results agree satisfactorily. In addition, we verified the 0{N log N) computational scaling of SEASD and SEASD-nf 
in dynamic simulations. 

We rigorously validated SEASD and SEASD-nf for monodisperse and bidisperse colloidal suspensions via: (i) 
the short-time transport properties, (ii) the equilibrium osmotic pressure and viscoelastic moduli, and (Hi) the steady 
Brownian shear rheology at 0 = 0.45. Eor (/), the SEASD diffusivities and shear viscosity agree with the conventional 
SD calculations. The SEASD sedimentation velocity differ qualitatively from the SD results due to the absence of a 
mean-field quadrupole term in the mobility computation. Eor the bulk viscosity computation, different procedures to 
eliminate the spurious His lead to slight differences in the SEASD and the SD results. In (ii), SEASD and SEASD-nf 
reproduced the equilibrium suspension osmotic pressure for monodisperse and bidisperse suspensions within the error 
tolerance, with the SEASD data higher. Eor the steady shear rheology in (Hi), the agreement between SEASD-nf and 
SEASD is satisfactory in the suspension mechanics, dynamics, and structures. Moreover, we found that the particle 
size polydispersity reduces the suspension viscosity and osmotic pressure, and enhances the long-time translational 
self-diffusivities of both species. Our rheological simulations also improve our understanding on the structure, dy¬ 
namics, and rheology of polydisperse suspensions. 

The SEASD and SEASD-nf developed in this work are important tools for studying dynamics of dense, polydis¬ 
perse colloidal suspensions, and have significantly extended the parameter space accessible to computational studies. 
Eor example, they can provide otherwise inaccessible details on a wide range of experimental observations including 
the yielding phenomena in glass rheology and the continuous and discontinuous shear-thickening. 

Einally, through SEASD and SEASD-nf we demonstrated the generality and versatility of the SD framework, 
particularly the splitting of the far- and near-fleld interactions: with a suitable far-fleld computation, the lubrication 
interactions can be added pairwise for free. We believe that many far-fleld HI computational methods can and should 
be used with the SD framework to expand their accessible parameter range, particularly for dense systems. 
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